elisabete ferreira ma thesis - diva portal

86
Degree Project Master’s Level Anaphoric demonstratives in student academic writing A cross-disciplinary study of (un)attended this and these Author: Elisabete Ferreira Supervisor: Annelie Ädel Examiner: Jonathan White Subject/main field of study: English for Academic Purposes Course code: EN3077 Credits: 15 Date of examination: 13/06/19 At Dalarna University it is possible to publish the student thesis in full text in DiVA. The publishing is open access, which means the work will be freely accessible to read and download on the internet. This will significantly increase the dissemination and visibility of the student thesis. Open access is becoming the standard route for spreading scientific and academic information on the internet. Dalarna University recommends that both researchers as well as students publish their work open access. I give my/we give our consent for full text publishing (freely accessible on the internet, open access): Yes No Dalarna University – SE-791 88 Falun – Phone +4623-77 80 00

Upload: others

Post on 13-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Elisabete Ferreira MA Thesis - DiVA portal

Degree Project Master’s Level Anaphoric demonstratives in student academic writing A cross-disciplinary study of (un)attended this and these

Author: Elisabete Ferreira Supervisor: Annelie Ädel Examiner: Jonathan White Subject/main field of study: English for Academic Purposes Course code: EN3077 Credits: 15 Date of examination: 13/06/19 At Dalarna University it is possible to publish the student thesis in full text in DiVA. The publishing is open access, which means the work will be freely accessible to read and download on the internet. This will significantly increase the dissemination and visibility of the student thesis. Open access is becoming the standard route for spreading scientific and academic information on the internet. Dalarna University recommends that both researchers as well as students publish their work open access. I give my/we give our consent for full text publishing (freely accessible on the internet, open access):

Yes ☒ No ☐

Dalarna University – SE-791 88 Falun – Phone +4623-77 80 00

Page 2: Elisabete Ferreira MA Thesis - DiVA portal

1

Abstract

Cohesive devices such as anaphoric reference play an important role in written discourse. This

thesis investigates the extent to which the anaphoric demonstratives this and these are used as

determiners (‘attended’) or pronouns (‘unattended’) by first-year undergraduate students from

four different academic disciplines. Data extracted from the British Academic Written English

(BAWE) corpus were analysed quantitatively to determine the frequency of use of attended and

unattended this/these across disciplines, as well as qualitatively to examine the types of nominal

and verbal structures that follow the demonstratives. When compared to findings from previous

studies, novice student writers were found to employ this/these as pronouns to a larger extent

than both students at a more advanced level and research article writers. It was also observed

that the determiners this and these pattern differently, selecting distinct attending nouns to a

great extent. In addition, comparison of the results for each subcorpus shows that even though

there are some differences between the four disciplines, these differences are not as great as

might be expected and do not indicate a clear distinction between ‘hard’ and ‘soft’ sciences.

While the influence of genre has not been scrutinised, other possible explanations proposed

relate to the educational context and level of study in association with the range of lexical

choices available to novice student writers.

Keywords: cohesion, demonstratives, anaphoric reference, disciplinary variation, student writing

Page 3: Elisabete Ferreira MA Thesis - DiVA portal

2

Table of Contents 1 Introduction ........................................................................................................................ 3

1.1 Aim and Research Questions ........................................................................................ 4 2 Review of the Literature ...................................................................................................... 6

2.1 Academic Discourse(s) ................................................................................................. 6 2.1.1 English for general or specific academic purposes ................................................ 8

2.1.2 Disciplinary variation in academic writing.......................................................... 10 2.1.2.1 ‘Common core’ versus discipline-specific vocabulary ..................................... 11

2.2 Discourse Grammar: Cohesion in Written Discourse .................................................. 13

2.2.1 Cohesive devices and reference patterns ............................................................. 15 2.2.2 Demonstratives as determiners or pronouns ........................................................ 17

2.2.3 Attended and unattended this/these in academic writing ..................................... 19 3 Material and Methodology ................................................................................................ 23

3.1 Data............................................................................................................................ 23 3.2 Method of Analysis .................................................................................................... 24

3.2.1 Models of classification ...................................................................................... 28 4 Results and Discussion ...................................................................................................... 29

4.1 Overall Frequencies of Attended and Unattended this/these ........................................ 30 4.2 Most Frequent Nouns and Noun Types Following this/these ....................................... 32

4.3 Most Frequent Verbs and Verb Types Following this/these ........................................ 38 4.4 Distribution of Attended and Unattended this/these across Disciplines ....................... 42

4.5 Distribution of Nouns/Verbs and Types across Disciplines ......................................... 45 5 Conclusion ........................................................................................................................ 52

References ........................................................................................................................... 56 Appendix 1 List of nouns following this/these ...................................................................... 61

Appendix 2 List of nouns per type ........................................................................................ 73 Appendix 3 List of verbs following this/these....................................................................... 76

Appendix 4 Complete list of nouns per discipline ................................................................. 79 Appendix 5 Complete list of verbs per discipline.................................................................. 84

Page 4: Elisabete Ferreira MA Thesis - DiVA portal

3

1 Introduction

In order to construct a rhetorically effective text, writers adopt several strategies. One of them

is the use of cohesive devices to structure the text, connect ideas and maintain a clear flow of

information, which facilitates understanding and helps build a strong and convincing argument.

The importance of these devices is reflected in the attention paid in the literature to specific

linguistic items such as linking adverbials or connectors (e.g. Charles, 2011; Hinkel, 2001;

Granger & Tyson, 1996). While connectors have been researched to some extent, cohesive

items such as the demonstratives this/these and that/those have been less explored. As markers

of anaphoric reference, these help to establish “contextual ties between ideas” (Hinkel, 2001,

p. 112) and play an important role in text cohesion.

The appropriateness or not of leaving an anaphoric demonstrative (in particular this)

‘unattended’, that is, not followed by a noun, has been a matter of debate in academic writing

for decades (Swales, 2005; Geisler, Kaufer & Steinberg, 1985). The circumstances that lead

writers to choose between the pronominal form of a demonstrative and the determiner form

followed by a noun have received considerable attention in the North American approach to

composition and writing instruction (e.g. Moskovit, 1983; Geisler et al., 1985). In addition,

style manuals and textbooks (e.g. American Psychological Association, 2010; Glenn & Gray,

2013; Swales & Feak, 2012) continue to draw attention to the potential ambiguity and confusion

that writers can create by using an unattended demonstrative, placing an unnecessary burden

on the reader. Teachers too tend to be adamant that students should avoid disrupting the

cohesion of a text by making the anaphoric reference clear and often “scribble in the margin

‘this what?’” (Swales, 2005, p. 2).

Rather than taking a prescriptive approach to the use of (un)attended this/these, the

research focus has shifted in recent years to descriptive examinations of how these forms are

actually used in patterns of anaphoric reference, especially in academic writing. This

Page 5: Elisabete Ferreira MA Thesis - DiVA portal

4

development has been strongly supported by the increasing use of corpora and corpus tools.

Much of the emphasis has been placed, however, on published research articles (Gray & Cortes,

2011; Gray, 2010; Swales, 2005) and comparatively few studies have investigated how

demonstrative determiners and pronouns are used by student writers; even fewer have targeted

a non-North American context (e.g. Petch-Tyson, 2000). Those studies that have focused on

student writing have prioritised advanced or upper-level students (e.g. Wulff, Römer & Swales,

2012; Römer & Wulff, 2010). A comparison of their findings with results from an investigation

of attended and unattended demonstratives in novice writing could provide useful information

about their usage at different levels of study.

In comparison with the relative scarcity of studies on anaphoric demonstratives, a

phenomenon that has been somewhat more attended to in recent years is the systematic variation

within academic discourse. An increased awareness of disciplinary variation and the

importance of a discipline-specific identity have been widely acknowledged, as research has

demonstrated that academic discourse varies to a large extent across genres and disciplinary

communities (e.g. Hyland, 2004, 2006; Bhatia, 2002; Becher & Trowler, 2001). Students at

both undergraduate and postgraduate level are also expected to follow certain linguistic

conventions and develop disciplinary discourses in order to successfully progress through

tertiary education (Hyland, 2006, p. 39). The disciplinary perspective is thus also relevant to

explore in student writing (e.g. Nesi & Gardner, 2012; Staples, Egbert, Biber & Gray, 2016;

Gardner, Nesi & Biber, 2018).

1.1 Aim and Research Questions

The overarching aim of this study is to investigate the use of the demonstratives this/these in a

representative collection of texts produced by first-year undergraduate students from different

academic disciplines. It seeks to determine how frequently this/these are used as determiners or

pronouns, as well as the most common nouns and verbs that follow these demonstratives, and

Page 6: Elisabete Ferreira MA Thesis - DiVA portal

5

whether any disciplinary variation can be found. For these purposes, frequencies of use and the

most frequent nouns and verbs and respective types will be analysed, followed by a comparison

of the results across disciplines. The study will be guided by the following research questions:

1) To what degree are attended and unattended this/these used in first-year undergraduate

student writing?

2) Which nouns and noun types (concrete, deictic, shell, other abstract nouns) most

frequently follow attended this/these?

3) Which verbs and verb types (lexical, primary, modal) most frequently follow unattended

this/these?

4) To what extent, if any, do student writers in different disciplines differ with respect to

the frequency of use of attended and unattended this/these?

5) How are the most frequently co-occurring nouns/verbs and respective types distributed

across academic disciplines?

To address these research questions, the next section will review literature on academic writing

related to disciplinary discourses and cohesion, as well as provide an overview of previous

corpus-based research on the use of demonstratives as cohesive resources. Section 3 will then

describe the material and methodology employed in this study, after which the results of the

analysis of the data will be presented and discussed in section 4. Finally, a summary of the main

findings, as well as limitations and suggestions for future research will follow in the concluding

section.

Page 7: Elisabete Ferreira MA Thesis - DiVA portal

6

2 Review of the Literature

2.1 Academic Discourse(s)

Academic discourse broadly concerns “the ways of thinking and using language which exist in

the academy” (Hyland, 2009, p. 1). It is through discourse, in its different realisations, that

academics collaborate, communicate, create and disseminate knowledge. Academic discourse

is used by members of different social groups, often referred to as ‘discourse communities’,

within which they develop specialised knowledge and competence in the communication

practices of their particular disciplines (Paltridge, 2002, p. 15; see also Becher & Trowler,

2001). According to Swales (1990, p. 9), “discourse communities are sociorhetorical networks

that form in order to work towards sets of common goals”. A discourse community is further

identified by a particular way of communicating, its members possessing a certain level of

expertise and familiarity with the specialised vocabulary and genres that are relevant for their

communicative purposes (Swales, 1990, pp. 24-27). Different academic discourse communities

thus have distinct ways of using language and doing things. While it has been somewhat

contested, the concept of discourse community nevertheless “proves useful in identifying how

writers’ rhetorical choices depend on purposes, setting and audience” (Hyland, 2009, p. 66) of

any given discourse, which in turn can help to understand how those communicative purposes

or goals are achieved by different groups.

Research on academic discourse in recent years has demonstrated that “the discourses of

the academy are enormously diverse” (Hyland, 2004, p. x) with respect to both genre and

discipline (e.g. Bhatia, 2002; Hyland, 2004), resulting in an increasing awareness of multiple

‘academic discourses’ that contrasts with the previously-held perception of “a single,

monolithic ‘academic English’” (Hyland, 2009, p. ix). The implications and value of this

Page 8: Elisabete Ferreira MA Thesis - DiVA portal

7

research, for language teaching in general and English for Academic Purposes pedagogy in

particular, are widely recognised (Flowerdew, 2002; Hyland, 2006).

Within academic discourse, particular attention has been paid to written discourse. As the

“main way scholarship is transmitted” (Flowerdew, 2016, p. 6), writing is essential for the

dissemination of knowledge through publication by professional academics. At the same time,

writing plays an equally essential role in the display of knowledge for assessment purposes by

students and “is probably the single most important skill necessary for academic success” (Nesi

& Gardner, 2012, p. 3). In spite of, or because of this role, it represents a challenge for novices

in academia. Academic writing in English specifically is usually thought to be a complex and

elaborated form of discourse, characterised by long and unfamiliar words and rather abstract

(Biber & Gray, 2010). Certain features that are typically associated with academic texts include

high lexical density and nominalisation (compressed nominal or phrasal structures); a formal,

detached and impersonal style; source use and intertextuality (McCarthy, Matthiessen, & Slade,

2010, p. 55; Paltridge, 2002, pp. 136-137; see also Biber, 2006; Biber & Gray, 2010). Some of

these features make academic language not only difficult to read and understand but also

intimidating for students, especially novice writers (Flowerdew, 2016, p. 7; Biber, 2006).

Among the multiple approaches to the study of academic discourse, findings from corpus-

based investigations have contributed considerably to the description and characterisation of

academic discourse in general, and academic writing in particular. In addition to research on

the distinctive features of academic discourse, the different registers of university language

have been examined (e.g. Biber, 2006), as have the diversity of academic genres and their

interplay with the various disciplinary discourses (e.g. Hyland, 2004). Large corpora of

academic texts, such as the British Academic Written English (BAWE) corpus, have also been

used to investigate different genres and features of student academic writing. Nesi and Gardner

(2012), for instance, have identified, from the wide variety of university assignment genres

Page 9: Elisabete Ferreira MA Thesis - DiVA portal

8

across several disciplines, five different social functions, which are linked to the different stages

of a degree: ‘demonstrating knowledge and understanding’; ‘critical evaluation and developing

arguments’; ‘developing research skills’; ‘preparing for professional practice’; and ‘writing for

oneself and others’. These functions help to gain a better understanding of the type of

knowledge that students need to demonstrate to meet tutors’ expectations for different tasks, as

well as of the various linguistic features that are characteristic of each. Also using data from the

BAWE corpus, Staples et al. (2016) have examined how students’ writing develops in terms of

grammatical complexity through levels of study, mediated by genre and discipline, and found

that their writing becomes more complex as students advance in their studies. Taking a different

approach, Gardner et al. (2018) have also explored the interaction between several situational

variables in student writing. Using a multidimensional analysis, they identified the way certain

linguistic features are differently clustered or dispersed along four dimensions across

disciplines, levels of study and genres in the BAWE corpus.

2.1.1 English for general or specific academic purposes

The increasingly pervasive use of English in academia and research in the past decades has led

to the emergence of an area of study in Applied Linguistics designated English for Academic

Purposes (EAP). One of the main branches of English for Specific Purposes (ESP), EAP

specifically “covers language research and instruction that focuses on the communicative needs

and practices of individuals working in academic contexts” (Hyland & Shaw, 2016, p. 1). The

main focus of EAP instruction is the learner and their needs in preparation for tasks or work in

academic environments, while taking into account the demands of specific academic

disciplines. One of the goals of EAP is thus to equip students with skills in and awareness of

different disciplinary conventions, which will help them produce more genre-oriented and

discipline-specific writing in English (Flowerdew, 2002).

Page 10: Elisabete Ferreira MA Thesis - DiVA portal

9

The growing importance of English as a medium of university instruction alongside the

globalisation of higher education has led to the expansion of the role and focus areas of EAP,

which in turn generated the “tension between a general EAP addressing a generalized academic

discourse and dedicated disciplinary-discourse instruction” (Hyland & Shaw, 2016, p. 6). As a

result, two main approaches to EAP instruction have emerged, English for General Academic

Purposes (EGAP) and English for Specific Academic Purposes (ESAP). EGAP is concerned

with the general academic skills and ‘common core’ features of academic language that are

needed by students in all disciplines, whereas ESAP addresses the specific needs of students of

particular disciplines (Flowerdew, 2016, p. 7). Often associated with different levels of study,

EGAP courses provide undergraduates and taught graduates with the generic academic skills

they require, while research postgraduates and academics have more discipline-specific needs

that can be met through ESAP instruction (Flowerdew, 2016, p. 8). The fact that many fields of

study are becoming more interdisciplinary, leading to higher demands on students to acquire

competence in more than one discipline, adds however another layer of complexity (Bhatia,

2002, p. 27; Flowerdew, 2016, p. 8).

The ongoing debate over the issue of specificity in ESP/EAP has centred around the

question whether there are transferrable or common skills and language features to justify a

general approach, or should the focus lie on providing students with discipline-specific

instruction (Hyland, 2002, p. 385; see also Shutz, 2013). As discussed in more detail in Section

2.1.2.1, a main area of focus of EAP where this question is particularly relevant and remains

central is that of academic vocabulary (generic versus discipline-specific), namely the creation

of word lists based on frequency, for which the use of corpora has been instrumental (Nesi,

2016). Corpora have been increasingly used by EAP researchers and practitioners as sources of

authentic language use for the analysis of large samples of academic text in different registers

and contexts (e.g. Biber et al., 1999; Biber, 2006; Hyland, 2004; Nesi & Gardner, 2012; Gardner

Page 11: Elisabete Ferreira MA Thesis - DiVA portal

10

et al., 2018). In addition to being used as research tools, corpora are also useful resources for

EAP writing instruction and materials development (Nesi, 2016; Hunston, 2002).

2.1.2 Disciplinary variation in academic writing

The concept of variation is not only relevant to distinguish between academic and other types

of discourse, but also within academic discourse, which varies according to different parameters

such as genre, register, mode (spoken/written), and discipline. Of particular interest here,

discipline is a complex concept to define, not least because it varies over time and geographical

location due to the “changing nature of knowledge domains” (Becher & Trowler, 2001, p. 41).

Hyland (2004, p. 1) suggests that disciplines are primarily defined by disciplinary-approved

practices that members of a local discourse community adopt and are competent in. Along the

same lines, for Becher and Trowler (2001) academic communities (or ‘tribes’) and disciplinary

knowledge (the ‘territories’) are “inseparably intertwined” (p. 23). In other words, belonging to

a discipline means sharing common ways of producing and communicating knowledge.

Disciplines differ in terms of preferred genres, how knowledge and arguments are

constructed, as well as the vocabulary or terminology used. For instance, the sciences typically

emphasise the construction of knowledge through proof in an objective way and value

precision, whereas the humanities favour the strength of argument and an interpretive discourse

that attempts to describe and understand familiar human experiences, while the social sciences

are seen as falling somewhere in between (Hyland, 2009, p. 63). It follows that disciplinary

knowledge mainly “reflect[s] real-world differences in subject matter” (Becher & Trowler,

2001). The distinction between ‘hard’ (sciences/technology) and ‘soft’ (humanities/social

sciences) disciplines that generally reflects common perceptions does not, however, imply a

clear-cut division between disciplinary groupings (Hyland, 2009, p. 64).

Gaining an awareness of discipline-specific conventions alongside developing

disciplinary knowledge is therefore essential for successful student academic writing

Page 12: Elisabete Ferreira MA Thesis - DiVA portal

11

(Flowerdew, 2016, p. 8; Hyland & Tse, 2007, pp. 248-249). As university students are exposed

to and learn to master the different communication skills that are specific to their target

disciplines, they also become socialised into the discourse communities of the various academic

disciplines (e.g. Nesi & Gardner, 2012; Hyland, 2006; Becher & Trowler, 2001).

Corpus research has contributed to revealing the extent of disciplinary variation in

academic writing both within and across disciplines (e.g. Hyland, 2004, 2006; Hyland & Tse,

2007). Advanced student writing in particular has been found to reflect to some extent the

differences in disciplinary discourses that students gradually come to recognise and use as they

progress through their studies (Staples et al., 2016; Gardner et al., 2018).

2.1.2.1 ‘Common core’ versus discipline-specific vocabulary

The fact that academic discourse can be distinguished from other types of discourse by the

prominent use of certain linguistic features (e.g. Biber, 2006) could suggest that there is a

common set of general academic skills or language features that cuts across disciplines.

Different disciplines will, however, vary in the extent to which they adhere to those features

and in how they use them (e.g. Hyland & Tse, 2007), especially considering the variety of

academic writing tasks and assignments that, for instance, university students are expected to

do in different disciplines (Nesi & Gardner, 2012). In addition, in terms of terminology or

vocabulary specifically, it is not clear whether there is a large enough ‘common core’

vocabulary which is characteristic of academic discourse and similarly used in all disciplines

(Hyland & Tse, 2007, p. 243).

Research on academic writing has underlined the distinction between technical or

specialised (discipline-specific) terminology and general academic vocabulary, that is, those

“items which are reasonably frequent in a wide range of academic genres but are relatively

uncommon in other kinds of texts” (Hyland & Tse, 2007, p. 235; see also Hyland, 2002). The

latter in particular has been the focus of lists of words specific to academic discourse, such as

Page 13: Elisabete Ferreira MA Thesis - DiVA portal

12

the Academic Word List (Coxhead, 2000), the Academic Vocabulary List (Gardner & Davies,

2013), and the Academic Formulas List (Simpson-Vlach & Ellis, 2010), which attempt to

capture the most important vocabulary that university students should master (Nesi, 2016).

The usefulness of such vocabulary lists has been debated over time and the question

remains “whether it is useful for learners to possess a general academic vocabulary […] because

it may involve considerable learning effort with little return (Hyland & Tse, 2007, p. 236).

Different positions have been taken on this point, with some authors arguing for the importance

of making lists of general academic vocabulary accessible to meet university students’ needs

(e.g. Simpson-Vlach & Ellis, 2010; Gardner & Davies, 2013), while others claim that many

words or phrases have different meanings or phraseological patterns in different disciplines and

that is more important for students to be made aware of those distinctive uses in their own target

disciplines (e.g. Hyland & Tse, 2007; Hyland, 2008).

Hyland and Tse (2007), for instance, question “the assumption that a single inventory can

represent the vocabulary of academic discourse and be so valuable to all students irrespective

of their field of study” (2007, p. 238) on the basis that their investigation shows that the extent

to which lexical items in the AWL are used across disciplines varies considerably. They also

highlight the differences in meaning and collocational environments that particular items (e.g.

volume, attribute) exhibit in different disciplines. Hyland (2008) further adds that disciplines

show different preferences in their use of lexical bundles (or highly frequent collocations), with

less than half of the top 50 identified bundles being common to the four disciplines under

investigation (Biology, Electrical Engineering, Applied Linguistics, and Business Studies). In

support of a discipline-specific approach to EAP teaching, Hyland concludes that these findings

undermine “the widely held assumption that there is a single core vocabulary needed for

academic study” (p. 20). In contrast, Simpson-Vlach and Ellis (2010, p. 509) identify a

“common core of academic formulas that do transcend disciplinary boundaries” (the Academic

Page 14: Elisabete Ferreira MA Thesis - DiVA portal

13

Formulas List) by using a ‘formula teaching worth’ measure that combines frequency and MI

statistics in the ranking of formulas. While acknowledging the need for further research on

disciplinary variation, they argue that differences in operationalisation in Hyland’s (2008) study

could explain their contrasting results. Simpson-Vlach and Ellis (2010) maintain that the AFL

is an “empirically derived, pedagogically useful list” of frequently recurrent formulaic

sequences across academic genres and disciplines that occur significantly more in (spoken and

written) academic discourse than in non-academic discourse, and hence could be considered

representative of an academic style of discourse. Other studies (Shutz, 2013; Römer & Wulff,

2012) lend further support to the relevance of a general approach to teaching academic

vocabulary by demonstrating that there are a number of verbs (related to the research activity,

e.g. reporting information, describing data and results) and nouns (mainly metadiscoursal and

related to methodology) that are highly frequent in and common to several disciplines.

Based on the increasing literature on general versus discipline-specific EAP instruction,

it seems that it would be more adequate to say that one does not exclude the other; rather, a

combination would suit a larger number of students, depending on specific teaching contexts

and stages. In addition to the specific uses of vocabulary (both individual words and multiword

units) in different disciplines that students need to be exposed to and acquire, there is still an

important set of academic words or phrases that are shared across disciplines, which reflect the

activities and functions that are typical of ‘academic’ work.

2.2 Discourse Grammar: Cohesion in Written Discourse

A text can be defined as “the verbal record of a communicative act” (Brown & Yule, 1983,

p. 6), meaning that it is an instance of language in use rather than language as an abstract system

of meanings and grammatical relations. A stretch of language of any length can be identified as

a unit of meaning, or text, if it constitutes a unified whole, made up of certain resources that

create texture, within a specific communicative context (Halliday & Hasan, 1976). Cohesion is

Page 15: Elisabete Ferreira MA Thesis - DiVA portal

14

what ties a text together, that is, the network of grammatical and lexical relations that connect

various parts of a text (Halliday & Hasan, 1976):

Cohesion occurs where the INTERPRETATION of some element in the

discourse is dependent on that of another. The one PRESUPPOSES the other, in

the sense that it cannot be effectively decoded except by recourse to it. (p. 4;

authors’ emphases).

These surface meaning relations enable the reader to understand and interpret the content of the

text. By analyzing these relations or ties between dependent elements, the purpose and structure

of the text and its component parts can be more easily identified.

Cohesion also makes writing flow, or “mov[e] from one statement in a text to the next”

(Swales & Feak, 2012, p. 30), by creating and reinforcing connections at different structural

levels—sentence, paragraph and discourse. Within these, cohesion in writing can be established

through various language features, including information structure and thematic progression at

a macro-level, and lexicogrammatical cohesive devices, such as repetition, linking words, and

pronominal reference, at a more micro-level.

The notion of information structure typically concerns the development from ‘given’ (or

old) to ‘new’, which is the unmarked pattern of organisation of information in English. Given

information is that which is already known to the hearer/reader or otherwise “recoverable either

anaphorically or situationally” (Halliday, 1967, p. 211, as cited in Brown & Yule, 1983, p. 179),

whereas new information is not. In other words, given information represents shared knowledge

and provides a reference point to which new information can be related (Bloor & Bloor, 2004,

p. 66). This given-to-new pattern facilitates understanding of the information conveyed by the

speaker/writer and clearly indicates what they consider the most important information. The

early placement of given information (i.e. in the subject position) “establishes a content

Page 16: Elisabete Ferreira MA Thesis - DiVA portal

15

connection backward and provides a forward content link that establishes the context” (Swales

& Feak, 2012, p. 31).

Information structure is closely related to the thematic structure established within a

clause by its constituents, ‘theme’ and ‘rheme’. In the context of writing, the former is what the

clause is about, or its “starting point”, and the latter what is said about the theme, that is, the

new element or piece of information being introduced (Paltridge, 2012, p. 129; Bloor & Bloor,

2004, p. 73). The theme then serves both as a point of orientation by connecting back to previous

stretches of text and as a point of departure by connecting forward and contributing to the

development of later stretches (Bloor & Bloor, 2004, p. 73). The relationship between theme

and rheme, for example the way a theme develops a topic introduced by a previous rheme, also

contributes to the texture of a text. Thematic progression, a “key way in which information flow

is created in a text” (Paltridge, 2012, p. 131; italics in original), allows the writer to maintain a

continuity of ideas and the reader to follow the ideas in a text.

Writing cohesively thus implies ensuring a smooth flow of information as well as a clear

connection between ideas and dependent elements within a text. The lexicogrammatical devices

that can be used to establish cohesive ties will be the focus of the following section.

2.2.1 Cohesive devices and reference patterns

In their classic model of cohesion in English, Halliday and Hasan (1976) identify five main

cohesive devices which contribute to the texture of a text: reference, substitution, ellipsis,

conjunction, and lexical cohesion. Of particular interest for this study, reference and ellipsis

require the reader to elicit information in a certain point of the text, by retrieving it from or

relating it to a relevant part of the text. While ellipsis involves retrieval of information that can

be presupposed, reference creates textual cohesion by linking elements (referents) that enable

recovery of information from the immediately preceding or subsequent context. More

specifically, reference can be defined as the relationship of identity which enables the reader to

Page 17: Elisabete Ferreira MA Thesis - DiVA portal

16

trace entities or events in a text. It comprises a set of grammatical and discoursal resources that

allow the writer to refer back (anaphorically) to something that appeared before in the text or

forward to something that is yet to be introduced (Halliday & Hasan, 1976):

[…] the specific nature of the information that is signalled for retrieval. In the

case of reference the information to be retrieved is the referential meaning, the

identity of the particular thing or class of things that is being referred to; and the

cohesion lies in the continuity of reference, whereby the same thing enters into

the discourse a second time. (p. 31)

Maintaining continuity of reference, or chains of reference, enables the reader to interpret and

follow the flow of information as intended by the writer. As “sequences of noun phrases all

referring to the same thing […] in a relation of co-reference” (Biber, Johansson, Leech, Conrad

& Finegan, 1999, p. 234; emphasis in original), chains of reference contribute to a great extent

to text cohesion. Co-reference can be expressed through different linguistic forms.

Halliday and Hasan (1976) distinguish between three types of reference: personal,

demonstrative, and comparative. Personal reference involves using personal pronouns,

possessive determiners and possessive pronouns to “refer to something by specifying its

function or role in the speech situation” (p. 44). Through demonstrative reference, the referent

is identified by means of adverbial and nominal demonstratives in terms of location (in space

or time) and proximity. Comparative reference is indirect and includes two types of comparison,

general (likeness or unlikeness) and particular (quantity or quality), which are expressed by

adjectives and adverbs.

These types of reference can refer to the context of the situation (exophorically) or to

entities mentioned within a text (endophorically). Endophoric reference is established in two

main ways, namely through the use of anaphoric and cataphoric expressions (Brown & Yule,

1983, p. 192; Bloor & Bloor, 2004, p. 96). Anaphora refers to the pattern of reference by which

Page 18: Elisabete Ferreira MA Thesis - DiVA portal

17

a word or phrase is used to refer back to someone or something (antecedent) that has already

been mentioned earlier on in the text. In contrast, cataphora is the process by which some

linguistic items refer forward to someone or something coming later in the text. In this study,

only anaphoric reference is of interest as textual cataphoric cohesion is much less common

(Halliday & Hasan, 1976, p. 68).

The demonstratives (this, these, that and those) are an important means of establishing

cohesive and referential relations in discourse, as will be discussed in more detail below.

2.2.2 Demonstratives as determiners or pronouns

Described as forms of “verbal pointing” (Halliday & Hasan, 1976, p. 57), the demonstratives

this, these, that and those constitute a category of deictics or deictic expressions, that of spatial

deixis. Meaning “pointing” through language in ancient Greek, the term “deixis” generally

refers to those linguistic forms (e.g. this and that, here and now) that are tied to the situational

or textual context shared by the speaker/hearer or writer/reader, which means that context is

essential for their identification and interpretation. As important cohesive resources, deictic

expressions serve not only to point to something but also to establish anaphoric or cataphoric

reference to a preceding or following part of the discourse. Even though deictic expressions are

typically associated with spoken discourse, in writing the text itself can provide the context that

is needed to interpret these “pointing” expressions. In fact, deictics such as this are common

not only in conversation registers, but also in academic writing (Swales, 2005, p.1; Biber et al.,

1999, p. 349; see also Biber, 2006, p. 15) their referential meaning being determined by the

textual context in which they occur: relative proximity (this, these) and relative remoteness

(that, those).

In establishing anaphoric reference between a referring expression and an antecedent,

demonstratives can be used either independently as pronouns or ‘Heads’ (e.g. This means

that…) or dependently as determiners or ‘Modifiers’ preceding a head noun (e.g. This

Page 19: Elisabete Ferreira MA Thesis - DiVA portal

18

explanation…). Focusing specifically on what Halliday and Hasan (1976) call selective nominal

demonstratives (this, these, that, and those), these are distinguished in terms of number

(singular versus plural) and proximity (near versus distant), as shown in Table 1 below.

Table 1. Distinctive features of the demonstratives

Singular Plural

Near this these

Distant that those

According to Biber et al. (1999, p. 349), “proximity is insufficient to account for the

distribution of the demonstrative pronouns”, which varies depending on register: that more

common in conversation; this, these and those more often employed in academic prose. The

high frequency of the demonstratives this and these as determiners and as pronouns in academic

writing can be explained by “their use in marking immediate textual reference” (Biber et al.,

1999, p. 349). One further distinction then between these forms of demonstratives relates to the

anaphoric distance that is normally associated with each form: the shorter anaphoric distance

of demonstrative pronouns contrasts with the larger anaphoric distance of demonstrative

determiners. This in turn is related to the “relationship between explicitness and anaphoric

distance” (Biber et al., 1999, p. 240) that further distinguishes the demonstrative forms when

used pronominally as Heads or followed by a noun as Modifiers: the greater the distance

between the referring expression and its antecedent, the greater the possibility of creating

ambiguity. Halliday and Hasan also highlight the notion that singular demonstrative Heads (this

and that) are associated with extended text reference (cf. ‘situation reference’ in Petch-Tyson,

2000) in addition to referring anaphorically to a single referential point, but “in either case the

effect is cohesive” (Halliday & Hasan, 1976, p. 67).

This study focuses on the ‘near’ demonstratives this and these, which are characteristic

of academic writing (Biber, 2006, p. 15), and in particular on their distinctive anaphoric uses

Page 20: Elisabete Ferreira MA Thesis - DiVA portal

19

as determiners or as pronouns in establishing connections between parts of a text. More

specifically, this and these are often used in given-to-new patterns to refer back to a part or

entirety of the preceding sentence (or even several previous sentences), thus contributing to

textual cohesion. Phrases or clauses beginning with this/these followed by a noun (or ‘summary

word’) “summarize what has already been said and pick up where the previous sentence has

ended” (Swales & Feak, 2012, p. 43), as illustrated in example (1) below. Nesi and Gardner

(2012, p. 110) also draw attention to the summarising role of the anaphoric demonstrative this

followed by a lexical verb, such as This clearly shows that or This illustrates that, in essays

across disciplines in the BAWE corpus. The following examples illustrate how this can be used

to connect pieces of information and create text cohesion both as a determiner (1) and as a

pronoun (2).1

(1) When children reach school, they are required to have a metalinguistic awareness and

understanding of the language. This ability demands that they think about language, its

uses and the rules that govern it, in order to read and write. (LING_6045a)

(2) Without the vocabulary it has learnt during the acquisition of its first 50 words, a child

could not progress to this stage, and these words could not have been acquired without

the experimentation of sound production in the babbling phase. This shows that all stages

in the speech development process are important and should be viewed as equally

significant. (LING_6067b)

2.2.3 Attended and unattended this/these in academic writing

As previously noted, the anaphoric demonstratives this and these are important elements in

given-to-new information structuring of texts, offering a convenient way of cohesively “getting

out of a sentence and into another” (Swales, 2005, p. 6). They can, however, serve other

functions. As pronouns (or “unattended”), they can be an economical and effective means of

1 All examples are taken from the data used for this study and are identified by an abbreviation of the subcorpus (e.g. LING) followed by the Document ID number (e.g. 6045a) as referenced in the BAWE corpus documentation. Any typographical or grammatical errors in the original text were kept.

Page 21: Elisabete Ferreira MA Thesis - DiVA portal

20

offering an explanation or conveying information in reference to preceding discourse of varying

lengths. When supported by a noun (or “attended”), they can pinpoint the focal point of a

proposition, while simultaneously allowing the writer to add clarity or interpretation of the

referent (Swales & Feak, 2012, pp. 43-48; Geisler et al., 1985, p. 151). These different functions

and the “long and unfinished story” (Swales, 2005, p. 4) of the attended and unattended forms

of the anaphoric demonstratives can be summarised in terms of trade-offs between economy

and clarity and rhetorical opportunities (Geisler et al., 1985):

Out of control, the unattended this points everywhere and nowhere; under

control, it is the lanuage’s [sic] routine for creating a topic out of a central

prediction [sic], pointing to it, bringing it in to focus, and discussing it; all done

in one stroke, gracefully, economically, and without names. (p. 153)

While Swales (2005) argues that “tacit sense of the tradeoff between economy and clarity […]

probably only comes with considerable writing experience” (p. 14), the choice between clear

and economical reference is particularly relevant for university students, who are expected to

learn how to write clear, unambiguous texts using an ‘academic’ style of discourse that is

typically compressed and inexplicit (Biber & Gray, 2010, p. 19).

Despite the potential effectiveness of both determiner and pronominal forms of the

demonstratives, academic style guides and textbooks often advise against or recommend the

careful use of unattended this (e.g. American Psychological Association, 2010; Glenn & Gray,

2013; Swales & Feak, 2012). The APA manual, for instance, refers to the pronominal forms

this, that, these, and those as “the most troublesome” and recommends writers to “[e]liminate

ambiguity by writing, for example, this test, that trial, these participants, and those reports”

(2010, p. 68). In a section called “This and Summary Phrases”, Swales and Feak (2012) also

recommend as best practice that the demonstrative this be followed by a noun whenever “there

is a possibility your reader will not understand what this is referring to […] so that your meaning

Page 22: Elisabete Ferreira MA Thesis - DiVA portal

21

is clear” (p. 43). They add however, in a final section commentary which was not included in

previous editions of the textbook, that “there are occasions when “unattended” this (no

following noun) is perfectly reasonable” (p. 48). This additional commentary, as noted by the

authors, is based on the findings of a corpus-based study on advanced student academic writing

(Wulff et al., 2012).

Recent research on published academic prose (Gray, 2010; Gray & Cortes, 2011; Swales,

2005) corroborate these findings by reporting on the common use of this and these as pronouns

in empirical research articles across academic disciplines. One possible explanation relates to

requirements on academics during the editorial and revision process to submit shorter texts, and

a “simple way of doing this is to ‘de-attend’ instances of this” (Swales, 2005, p. 14). It could

also be argued that this increasing awareness and use of the pronominal demonstratives (and in

particular of this) in academic writing might be related to a shift in academic English towards

a less explicit expression of meaning that has been found in recent studies (e.g. Biber & Gray,

2010). In contrast to the (mostly implicit) reference to elements in the situational context in

spoken discourse where pronouns (including deictics) abound, academic written discourse is

often claimed to be “maximally explicit in meaning” (Biber & Gray, 2010, p. 11). In writing,

deictic pronouns are used for textual reference, pointing to a specific part of the text (or

proposition). Failure to use referring expressions that can identify antecedents clearly could

create ambiguity and cause confusion in particular to non-experts (Biber & Gray, 2010).

The demonstratives this and these have also been associated with a perceived increase of

informality in research articles, which is not as prevalent as generally thought (Hyland & Jiang,

2017; Biber & Gray, 2010, p. 17). From a list of ten features typically considered ‘informal’

and proscribed in academic writing, Hyland and Jiang (2017) found that unattended reference

was one of the three main features that influenced the overall results the most. Despite their

high frequency, anaphoric unattended pronouns (this, these, that, those, it) showed a declining

Page 23: Elisabete Ferreira MA Thesis - DiVA portal

22

trend over time in the four disciplines chosen to represent the hard and soft sciences (Applied

Linguistics, Sociology, Engineering, and Biology). It is unclear, however, to what proportion

each of the anaphoric pronouns declined (or increased). Adopting the same list of ten categories

of informality, Lee, Bychkovska and Maxwell (2019) compared the use of informal language

in argumentative essays by native English and non-native undergraduate students using data

extracted from the MICUSP and COLTE corpora of student writing.2 They found that both

groups frequently use features of informality, in particular anaphoric unattended pronouns,

which represented over 47% and 55% of all informal features in the native and non-native

student corpora, respectively. Mixed patterns were found, however, concerning the individual

pronouns preferred by each group, with native students using unattended this significantly more

than non-native students, while the remaining pronouns (these, that, those, it) occurred

significantly more frequently in the non-native corpus. Lee et al. (2019) argue that the native

students use “a broader range of informal features, particularly those that have become

relatively legitimized in academic writing such as […] unattended this” (p. 152). It could be

added that not only first language but also level of study/expertise is an important explanatory

factor, since the native students, as senior undergraduates, are presumably more exposed to and

more aware of research writing practices than non-native students from first-year writing

courses. Further studies would help “to determine whether it is the L1, writing experience,

reader (i.e., subject-matter or composition instructor), writing task, or a combination of these

factors that affect undergraduate students’ stylistic choices” (Lee et al., 2019, p. 152).

2 The Michigan Corpus of Upper-Level Student Papers (MICUSP) corpus contains 830 A-graded, upper-level papers of different genres from 16 academic disciplines. The Corpus of Ohio Learner and Teacher English (COLTE) is a large collection of English as a second language (ESL) student writing and teacher written feedback, compiled at Ohio University.

Page 24: Elisabete Ferreira MA Thesis - DiVA portal

23

3 Material and Methodology

This chapter presents the material used for the study and describes the quantitative research

methods employed and the models of classification used in the qualitative analysis.

3.1 Data

The material for this study was extracted from the British Academic Written English (BAWE)

corpus, which comprises texts in a wide range of university genres (e.g. essays, research reports,

case studies) from four levels of study (first-year undergraduate to taught master’s level) across

30 main academic disciplines, totaling approximately 6.5 million words.3 The corpus includes

coursework assignments produced by native and non-native English speakers from four British

universities which were awarded ‘merit’ and ‘distinction’ grades, and hence the writing can be

considered matching the standards set by subject tutors (Nesi & Gardner, 2012, p. 6).

The documentation provided with the BAWE corpus includes a spreadsheet with

metadata about its content (e.g. discipline, grade, first language of the author). By using the

metadata filters, the corpus data was restricted to texts by first-year native English speaker

students from four disciplines representing each of the four broad disciplinary groups—Arts

and Humanities, Social Sciences, Life Sciences, and Physical Sciences.4 The choice of

discipline was based on the most amount of data available for the first-year level of study:

Linguistics (LING), Law (LAW), Biological Sciences (BIO) and Engineering (ENG). The

composition of the four subcorpora that were created is shown in Table 2.

3 “The data in this study come from the British Academic Written English (BAWE) corpus, which was developed at the Universities of Warwick, Reading and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics, Warwick), Paul Thompson (formerly of the Department of Applied Linguistics, Reading) and Paul Wickens (School of Education, Oxford Brookes), with funding from the ESRC (RES-000-23-0800).” https://warwick.ac.uk/fac/soc/al/research/collections/bawe/how_to_cite_bawe Additional information about the corpus is available at www.coventry.ac.uk/BAWE 4 Two main groups can be distinguished corresponding to the general distinction between ‘hard’ and ‘soft’ sciences: the natural and physical sciences (Biological Sciences and Engineering) versus the humanities and social sciences (Linguistics and Law).

Page 25: Elisabete Ferreira MA Thesis - DiVA portal

24

Table 2. Overview of the subcorpora

Disciplinary group Discipline Number of

words

Number of

texts

Average text

length

Arts and Humanities Linguistics 40,812 25 1,632

Social Sciences Law 56,630 26 2,178

Life Sciences Biological Sciences 72,492 42 1,726

Physical Sciences Engineering 58,857 36 1,635

228,791 129 Note: No editing was made to the original texts; any footnotes or reference lists are included in the word count.

As can be seen, there is a certain imbalance between the four subcorpora, with the

Biological Sciences subcorpus being larger than the rest. In addition, genre could not be

controlled for due to the wide variation of the assignments and the unequal distribution of data

available across disciplines and genres (see Gardner and Nesi, 2013, for details on the

classification of texts in the BAWE corpus into 13 ‘genre families’ according to their purpose

and generic structure). Each subcorpus comprises different types of texts, but predominantly:

essay in Linguistics; essay and critique in Law; methodology recount and explanation in the

Biological Sciences; and methodology recount in Engineering—closely matching the general

distribution of these genres across disciplines and levels of study in the BAWE corpus (Nesi &

Gardner, 2012, pp. 51-52). This means that the results of the present study could be mediated

not only by discipline but also genre. While acknowledging these limitations, the selected data

can nevertheless be considered representative of native English proficient student writing in

each of the selected disciplines. It is acknowledged, however, that the generalizability of the

findings is limited by the small size of the data sets.

3.2 Method of Analysis

Corpus linguistics has been used by researchers from different fields as a methodology to study

language in real-life contexts through corpora of authentic texts and the use of computer

software (e.g. concordance tools). A corpus can be defined as “an electronically stored

Page 26: Elisabete Ferreira MA Thesis - DiVA portal

25

collection of samples of naturally occurring language” (Hunston, 2006, p. 234) designed for a

particular purpose. Several tools and methods, such as word lists, keywords and lists of

collocates, can be used to explore corpora and investigate specific linguistic features or

recurrent phraseology (Nesi, 2016; Hunston, 2002). While predominantly quantitative in nature

(based on frequency data), corpus-based analyses are often combined with a close examination

of concordance lines that show the context in which words co-occur, allowing researchers to

test hypotheses and identify patterns of language use in large amounts of data that could go

unnoticed if relying only on intuition or introspection.

This study employed quantitative corpus-based methods to determine the frequencies of

occurrence and the extent of variation or similarity in the use of the demonstratives this and

these in the data sets detailed in the previous section. A further analysis of the context in which

this and these occur as a pronoun or as a determiner was carried out to determine the main types

of verbs and nouns that follow the demonstratives.

As a first step, each of the four corpora was searched for the terms this and these using

the concordance tool AntConc (Anthony, 2019) and the total number of occurrences per corpus

was recorded (see Section 4.1). The output of each search was saved as a text file and imported

into an Excel spreadsheet for subsequent analysis and annotation of the data.

Next, the context of each concordance line was analysed to distinguish and code each this

and these as a pronoun or a determiner, after which their frequencies (raw and normalised) and

percentages were calculated and their distribution across disciplines determined (see Sections

4.1 and 4.4). Some instances were excluded from further analysis, including (3) below, where

the demonstrative this was used for cataphoric reference. Additionally, concordance lines where

this or these were part of quoted text or linguistic examples, such as (4) and (5), were also

excluded.

Page 27: Elisabete Ferreira MA Thesis - DiVA portal

26

(3) Alan Freeman in "Truth and Mystification in Legal Scholarship" says this: […]

(LAW_0086d)

(4) Denning effectively overruled this, disregarding the doctrine of precedent, and stating

that "beneficial interests in the matrimonial, or furniture, belongs to one or the other

absolutely, or it is clear that they intended to hold it in definite shares. The court will give

effect to these intentions;" striking an accommodating approach. (LAW_0209a)

(5) In a phrasal exchange we get I got into this guy with a discussion for I got into a discussion

with this guy. Perhaps the speaker meant to say I got into this discussion with a guy.

(LING_6010c)

The subsequent step was to identify and classify the verbs and nouns that follow this/these as

pronouns or determiners, respectively, according to the models of classification described in

the next section. Figure 1 illustrates the coding of this in a sample of the data.

Figure 1. Example of coding of a data sample from the Engineering corpus

In the case of verbs, these were classified only when the pronoun served as the subject of the

verb, which means that for instance are in example (6) below was coded (verb: be; type of verb:

primary), whereas are in (7) was not analysed further since its subject is the noun implications.

Page 28: Elisabete Ferreira MA Thesis - DiVA portal

27

(6) However, this patient speaks fluently with quite long utterances and few pauses. Again,

these are typical signs of Wernicke's aphasia (LING_6067f)

(7) It makes sense that this could work vice versa with words altering the meaning of pictures.

The implications of this are huge. Moving away from the sphere of music videos and the

potential of multi-modal texts is huge and important (LING_6018a)

With regard to nouns, the frequency counts and classification of types apply to head nouns only,

where the demonstrative was part of a noun phrase with pre- or post-modification, except for

compound nouns. For instance, in examples (8) and (9) below, only the head nouns module and

dilution were counted and coded. In the case of coordinated noun phrases with two head nouns,

only the first one was recorded. In addition, nouns that were misspelled were excluded from

annotation.

(8) If there was a fire in the house then this module will receive a high value in the heat

section and so call for the fire brigade. This software module would have different levels

of action. (ENG_0228f)

(9) This end point dilution gives the HA titre of the virus, which was 6400 HA units/ml and

there was 2.88 x 1010 virus particles/ml. (BIO_0006b)

To make the coding of the noun types more manageable, for this part of the analysis only a

subset of the data was used, namely those instances of this/these in sentence-initial position,

which constitute approximately 40% of the total amount of data (see Table 3 in Section 4.1).

The specific challenges related to the classification of noun types will be further discussed in

the following section.

Finally, after the data coding was completed, the frequencies of the nouns and verbs that

follow this/these and respective types were calculated for each corpus and for the total, after

which their usage across disciplines was compared (see Sections 4.2, 4.3 and 4.5).

Page 29: Elisabete Ferreira MA Thesis - DiVA portal

28

3.2.1 Models of classification

For pronominal uses of this/these, the first verb following the demonstrative was classified as

lexical (e.g. give, explain), primary (be, have, and do), or modal (e.g. may, will), according to

the three main verb classes in the Longman Grammar of Spoken and Written English (Biber et

al., 1999, p. 358). This framework was employed to distinguish between the different types of

verbs found in the data and to determine which types are most used with unattended instances

of this/these overall and whether any disciplinary preferences can be found.

The noun taxonomy in Gray’s (2010) study was initially adopted to classify the nouns

modified by this/these as concrete, shell, abstract, and species/quantifier. To these, a category

used by Gray and Cortes (2011), deictic nouns, was later added to account for the prevalence

in the analysed data of nouns referring to the text itself or to parts of it (e.g. this essay, this

paper, this graph), as could be expected in university assignments where students often refer

to their own writing. In contrast, species/quantifier nouns were too infrequent in the data (less

than half a percent of the total occurrences) to represent a separate category and were thus

grouped under “other abstract nouns”. Table 3 below shows the four noun types adopted in this

study (concrete, deictic, shell, other abstract nouns), based on a combination of the categories

used by Gray (2010) and Gray and Cortes (2011).

Table 3. Definitions and examples of noun types, adapted from Gray (2010) and Gray and

Cortes (2011)

Concrete nouns Nouns that represent physical entities or

objects that can be touched, heard or seen.

e.g. apparatus, card,

kit, student, specimen

Deictic nouns Nouns that orient the reader by pointing to

the overall text or a specific part of it, or to

extralinguistic elements.

e.g. study, article,

figure, section

Page 30: Elisabete Ferreira MA Thesis - DiVA portal

29

Shell nouns Abstract nouns that summarise or

encompass preceding information and

carry it into the next clause or sentence.

e.g. method, result,

model, finding, issue,

analysis

Other abstract nouns Nouns that refer to concepts or ideas that

cannot be measured or observed.

e.g. requirement,

value, level, range,

type, kind

The above definitions of each category are useful to a certain extent; however, it became

clear early on in the coding that there is no straightforward one-to-one correspondence between

form and meaning (see examples (16) to (18) in Section 4.5). Shell nouns in particular (also

referred to in the literature as ‘general nouns’, ‘anaphoric nouns’, ‘carrier nouns’ or ‘signalling

nouns’) could only be coded as such if they performed the function of “encapsulation of

meanings expressed in prior discourse” (Gray & Cortes, 2011, p. 34). Distinguishing between

shell and other abstract nouns thus proved to be the most time-consuming and complex part of

the classification, which mostly rested on the interpretation of the specific meaning of each

noun based on its use in context. This contrasts with Gray’s operationalisation of shell nouns

as those that had been identified in previous research, which could have had some influence on

the results reported (2010, p. 181). Given the overlap between these noun types and the degree

of subjectivity involved in their analysis, it remains to be determined whether such distinction

can be objectively and effectively operationalised.

4 Results and Discussion

This section presents the main findings of this study as follows: in Section 4.1 the overall

frequencies of this/these as determiners and pronouns are analysed; the nouns, verbs, and

respective types that follow the demonstratives are then identified in Sections 4.2 and 4.3;

Page 31: Elisabete Ferreira MA Thesis - DiVA portal

30

finally, Sections 4.4 and 4.5 compare the distribution across academic disciplines of pronominal

and determiner uses of this/these, as well as of the most frequent nouns, verbs and types of

nouns and verbs.

4.1 Overall Frequencies of Attended and Unattended this/these

A total of 2,547 instances of this (2,046) and these (501) were found in the corpora of student

writing analysed in this study, as can be seen in Table 4 below. The singular form is much more

frequent, as might be expected due to the predominant reference to abstract entities and

concepts in academic prose, occurring on average nearly nine times for every thousand words

in the data. This contrasts with the average of six times per thousand words reported for

published research articles (Swales, 2005, p.1). The number of texts further reinforce this

difference between the singular and plural forms: this is present at least once in each of the texts

that comprise the corpus, whereas these can be found in fewer texts. In contrast, the two forms

are similar in terms of the position they take in the sentence, with both this and these occurring

more often in a medial or final position in the sentence than sentence-initially.

Table 4. Number of occurrences of this and these (percentages rounded to the nearest whole

number)

Demonstrative Raw

frequency

Frequency per

1,000 words

Number of

texts

Sentence-initial

(%)

this 2,046 8.94 129/129 870 (43)

these 501 2.19 116/129 174 (35)

2,547 1,044 (41)

A further distinction between this and these is found in their use as determiners or pronouns.

The figures in Table 5 show that this is employed fairly evenly as determiner and as pronoun,

which contrasts with the overwhelming use of these followed by a noun in the data.

Page 32: Elisabete Ferreira MA Thesis - DiVA portal

31

Table 5. Frequencies of this/these as determiners and pronouns (percentages rounded to the

nearest whole number)

Demonstrative Determiner % Pronoun %

this 1,063 52 970 47

these 390 78 108 21

1,453 1,078

Similarly to other studies of student academic writing (Wulff et al., 2012; Petch-Tyson,

2000; Römer & Wulff, 2010), these results show that first-year undergraduates are not

observing the prescriptive guidelines that generally advise against leaving a demonstrative

unattended. They also indicate that student writers use pronominal this more frequently than

research article writers (e.g. Gray, 2010; Gray & Cortes, 2011). On the other hand, the results

for this differ somewhat from those reported previously by Wulff et al. (2012) and Petch-Tyson

(2000), even though no direct comparison can be made due to differences in operationalisation.

Wulff et al. (2012) identified more cases of attended than unattended this (57% versus 43%) in

papers by advanced-level students (MICUSP corpus), but they counted only those instances in

sentence-initial position rather than in all positions in the sentence. A much higher percentage

of demonstrative determiners (64%) than of pronouns (36%) was also reported by Petch-Tyson

(2000), who compared the use of both this/these and that/those by American university students

(a subset of the LOCNESS corpus) and English as a foreign language (EFL) students.5

It could be argued that the educational context (North American in the case of the above

mentioned studies, or British in the case of the material used for the present study) might play

a role in the degree to which novice student writers use this/these as determiners or pronouns.

More exposure in North American contexts to style manuals, writing handbooks and textbooks

(e.g. American Psychological Association, 2010; Glenn & Gray, 2013; Swales & Feak, 2012)

5 The Louvain Corpus of Native English Essays (LOCNESS) corpus comprises general argumentative essays written by American and British students.

Page 33: Elisabete Ferreira MA Thesis - DiVA portal

32

that offer explicit guidance on what constitutes ‘good’ academic writing could then provide at

least part of the explanation for the differences observed in the frequencies of use of attended

versus unattended this/these in this study. The high percentage of unattended this could also be

the result of a more limited range of academic vocabulary or fewer lexical options at the

disposal of first-year undergraduates when compared to more advanced-level students, which

could potentially lead first-year students to resorting to somewhat less complex structures when

constructing their arguments. This possible explanation might be further supported by analysing

the most frequent nominal and verbal structures following this and these, which will be reported

on in the next sections.

4.2 Most Frequent Nouns and Noun Types Following this/these

Overall, 1,423 head nouns (lemmas) were identified in the data (the complete list is provided

in Appendix 1), a substantial number of which occurs only once (272 nouns).6 Table 6 shows

the most common nouns (ten or more instances) that follow this and these combined, which

account for approximately one third of the total number of occurrences. The majority are

methodology- or results-related nouns (e.g. method, process, result) and nouns that relate to

student activity or type of task (e.g. essay, report), but also nouns with general meaning are

found (e.g. case, point, way), which in context take on specific meanings. The latter include

nouns that occur in specific prepositional phrases with anaphoric function, such as “in this case”

and “in this way”, which can be considered semi-fixed expressions and occur relatively

frequently in the data (for example, there are 36 instances of in this case). It is also interesting

to note that as few as seven nouns occur 20 or more times: case, experiment, essay, point, way,

theory and value. The specificity of this short list confirms the main uses of the demonstratives

to refer not only to elements of the real world, but also to discourse-level elements.

6 The results reported on throughout Section 4 refer to lemmas for both nouns and verbs.

Page 34: Elisabete Ferreira MA Thesis - DiVA portal

33

These findings are generally consistent with those of other studies on (un)attended this

both in student and professional academic writing. Just over a third of the nouns in Table 6

appear in the top 25 head nouns attending this in MICUSP papers, as reported by Römer and

Wulff (2010). A number of those are also part of the top 50 attendant nouns that Swales (2005)

found to be most frequent in a corpus of research articles representing ten different disciplines.

Both studies also point out the prominence in their data of metadiscoursal nouns, as well as

nouns referring to methodology, which seem to be not only frequent but also common to several

disciplines.

Table 6. The most frequent nouns following

this/these (cut-off frequency of 10)

Noun Frequency

case

experiment

essay

point

way

theory

value

method

model

process

laboratory

result

data

investigation

reaction

report

approach

organism

stage

56

47

37

29

25

24

20

17

17

17

16

16

14

14

14

14

13

13

13

Page 35: Elisabete Ferreira MA Thesis - DiVA portal

34

area

idea

question

time

gene

offence

type

feature

12

12

12

12

11

11

11

10

In line with the disparate results for this and these reported on in Section 4.1, differences

can also be observed in terms of how they pattern with nouns individually. As can be seen in

Table 7, there is almost no overlap between the ten most frequent noun lemmas that co-occur

with the singular and plural forms, with only experiment (underlined) appearing in both lists,

albeit with different degrees of occurrence. In addition, the nouns that follow this are

predominantly abstract nouns and coincide to a large extent with the top half of the list of most

common nouns overall shown in Table 6 above, whereas these is most often used with concrete

nouns (e.g. cell, gene, organism, people, bacterium), in this case nouns specific to the

Biological Sciences corpus, which could be partly due to the larger size of that corpus.

Somewhat related, the predominance of methodology recounts in the data (particularly in the

hard disciplines) is a possible explanation for result to be the most frequent noun lemma

attending these, considering that the main identified purpose of those assignments is to

“demonstrate/develop familiarity with […] conventions for recording experimental findings”

(Gardner & Nesi, 2013, p. 39).

Table 7. The most frequent noun lemmas following this versus these

this these

Noun Frequency Noun Frequency

case 53 result 14

Page 36: Elisabete Ferreira MA Thesis - DiVA portal

35

experiment

essay

point

way

theory

laboratory

method

process

investigation

report

43

37

27

25

21

16

16

16

14

14

value

model

cell

gene

variable

error

organism

people

bacterium

difference

experiment

feature

number

question

stage

task

word

11

9

8

7

7

6

5

5

4

4

4

4

4

4

4

4

4

As indicated in Section 3.2, a subset of the data consisting of sentence-initial this/these

occurrences was used for the analysis of noun types. A total of 428 head nouns were classified

as (i) concrete, (ii) deictic, (iii) shell or (iv) other abstract nouns.7 The complete list of head

nouns ordered by type is provided in Appendix 2. As can be seen in Figure 2 below, while

deictic (e.g. essay, table, graph) and concrete nouns (e.g. plant, organism, device) constitute

approximately a third of the data, the majority of nouns fall into the other two categories: shell

nouns (38%) and other abstract nouns (31%).

The predominance of shell nouns (e.g. evidence, idea, process, approach) and other

abstract nouns (e.g. energy, evolution, density, equity) indicate that most of the occurrences of

sentence-initial this and these modify nouns that represent abstract entities and concepts

7 Refer to Section 3.2 for a complete description of how the selection of head nouns was made; for example, specialised terminology such as “word spurt”, “gene transfer” and “mens rea” was excluded.

Page 37: Elisabete Ferreira MA Thesis - DiVA portal

36

frequently employed in academic writing. These combinations of this/these + noun are cohesive

demonstrative structures used to point back to previously mentioned information or referent

either by repeating the same noun or by using a more general noun that encapsulates the

preceding text.

Figure 2. Percentage of types of nouns overall (percentages rounded to the nearest whole

number)

The fact that deictic nouns are slightly more common in the data than concrete nouns

could be explained by the type of texts included in the corpora, mostly comprising essays and

methodology reports, and thus the recurrent use of nouns related to the assignment itself is

perhaps not so surprising. Further investigation using different types of texts could be valuable

to help understand the role of genre as a variable in the use of anaphoric demonstratives.

Figure 3 displays the distribution of the noun types that follow this and these separately.

The patterns here are somewhat mixed when compared to the overall percentages in Figure 2.

As can be seen, while the trend line for this closely reflects the overall distribution, the use of

38

31

17

15

SHELL OTHER ABSTRACT DEICTIC CONCRETE

Page 38: Elisabete Ferreira MA Thesis - DiVA portal

37

other abstract nouns and concrete nouns with these is more salient, in line with the listing of the

most common noun lemmas attending each form presented in Table 7.

Figure 3. Percentage of types of nouns following this and these separately (percentages rounded

to the nearest whole number)

Reporting on expert writers’ use of this/these in Applied Linguistics and Engineering,

Gray and Cortes (2011) identified a similar distribution of noun types (decreasing from shell to

concrete nouns) but with different proportions; however, their results for this and these are

combined and based on discipline, and also include a third abstract noun category. Similarly,

even though no direct comparison can be made, Gray (2010) found shell and abstract nouns to

be predominant in her corpora of research articles in two different disciplines (Education and

Sociology), with concrete nouns coming last. Both these studies’ findings and those of the

present study reflect the pervasive use of abstract nouns in academic writing that contributes to

the complexity and formality that are typically associated with an academic style of discourse

41

28

19

12

29

38

11

23

S H E L L O T H E R A B S T R A C T D E I C T I C C O N C R E T E

this these

Page 39: Elisabete Ferreira MA Thesis - DiVA portal

38

(Biber & Gray, 2010). They could further suggest that novice students are generally following

up on expectations in terms of the types of nouns characteristic of academic English.

4.3 Most Frequent Verbs and Verb Types Following this/these

Overall, 111 different verbs were identified in the data, the majority (67) occurring only once

and 14 with more than ten occurrences. The complete list of verbs is provided in Appendix 3.

As can be seen in Table 8, the verb lemma BE (i.e. including all its forms) most frequently

follows this and these in the student corpora, accounting for 373 instances of the total 865 that

were classified for verb type.8 The fact that copula be is one of the “specific verb categories that

are typical of academic prose” (Biber, 2006, p. 14) provides one explanation for the prominence

of BE in the data.

Table 8. The most frequent verbs

following this/these.

Verb Frequency

BE

would

can

MEAN

may

HAVE

could

DO

SUGGEST

SHOW

will

373

48

39

38

30

25

21

21

20

16

16 Note: Small caps identify lemmas; primary verbs

are in bold, lexical verbs are shown underlined.

8 Refer to Section 3.2 for a complete description of how the selection of verbs was made.

Page 40: Elisabete Ferreira MA Thesis - DiVA portal

39

From the analysis of the most common verbs in the data, some trends appear to emerge:

the striking frequency of the verb BE in all its forms; a wide variety of lexical verbs, with only

a small number of them occurring often; and a few recurrent patterns, such as this is because

(see example 15) and this is due to (see example 12). These trends seem to indicate that students

consider the pronominal this + BE (especially in sentence-initial position, which accounts for

118 out of 334 instances) a ‘prefabricated’, fixed structure that conveniently allows them to

explain or give a reason for a previous point without resorting to more specific verb or noun

choices with potentially added shades of meaning. In addition, the most frequent lexical verbs

(MEAN, SUGGEST and SHOW) all have in common that they can be readily used to explain or

interpret findings (e.g. this means that, this suggests that, this shows that), as part of

constructing an argument, as can be seen in examples (10) to (12) below.

These findings are aligned with those of Wulff et al.’s (2012) investigation in that they

also found MEAN and SUGGEST to be predominantly used in clusters with unattended this in the

sort of interpretive or explanatory activity that is common in student writing. Wulff et al. further

indicate that their “results point towards an ongoing delexicalization of this + verb clusters like

this is and this means into textual organization markers” (p. 129). Interestingly, formulaic

sequences containing this is also feature prominently in Simpson-Vlach and Ellis’ (2010) list

of core formulas (AFL) in academic speech and writing.

(10) The method followed is as described in the laboratory manual. In estimating the

concentration of the prepared ovalbumin 0.01ml of the preparation was taken, rather than

the suggested 0.1ml. This was because at the suggested concentration, the absorbance at

280nm was off the scale of the machine. (BIO_0032c)

(11) Bacterial DNA can be exchanged between different strains allowing new allele

combinations to be produced. This means that a 'wild type' gene can be passed to a strain

with a mutation in this gene and through recombination the gene activity can be restored.

(BIO_0007a)

Page 41: Elisabete Ferreira MA Thesis - DiVA portal

40

(12) The production of polyclonal antibodies usually requires a large amount of pure protein,

whereas monoclonal antibodies require smaller amounts of impure protein. This is due

to the fact that polyclonal antibodies are produced by many different B lymphocytes, and

polyclonal preparations contain a mixture of antibodies that recognise different parts of

the protein. This means that more amounts of pure protein will be needed in order to

segregate the specific polyclonal antibody needed. (BIO_6119b)

A closer look at the concordance lines also reveals that pronominal this appears to be used

more frequently to refer anaphorically to longer antecedents, rather than single noun phrases,

as reported by other studies (e.g. Gray, 2010; Gray & Cortes, 2011; Wulff et al., 2012). This is

especially salient in the case of sentence-initial this, which often introduces paragraph-ending

sentences that summarise, explain, or give a reason for information presented in previous

sentences, such as evidence from experiments or descriptions of procedures (cf. Nesi &

Gardner, 2012, p. 110 on the role of this in the development of arguments). Wulff et al. (2012)

also found that this is clusters occur “relatively much more often in the middle and final sections

of paragraphs and texts” (p. 145).

Example (13) shows this is being used to provide a reason (why) for what was described

previously in a paragraph of five sentences, while (14) illustrates the use of this shows that

pattern in relating evidence to a claim. Both uses of this as determiner and as pronoun can be

seen in one single extract from Law in (15); while the noun definition refers anaphorically to a

delimited antecedent (a quote), this is because can be interpreted as referring to the entire

previous sentence.

(13) Looking at the two main choices, aluminium and copper, both create a stable oxide on

their surface once exposed to the elements. The oxides are stable at constant temperature.

However, copper oxides grow if the copper is heated. If the copper is heated for a lengthy

amount of time, then eventually the metal will harden and break. This is why it is

Page 42: Elisabete Ferreira MA Thesis - DiVA portal

41

especially important to make sure the copper has a good connection, therefore higher

tolerances in the joining processes need to be observed. (ENG_0023a)

(14) Different kinds of units such as features, segments, morphemes, words, and even

phrases are involved in speech errors where they might be exchanged, anticipated and

produced earlier than intended, or persevered and repeated. This shows that we construct

syntactic phrases and sentences in a buffer memory, where error may occur, before

articulating them instead of retrieving one word from the mental lexicon and say it one at

a time without involving any planning (Fromkin, Rodman & Hyams, 2003; Garman,

1990). (LING_6055a)

(15) Patteson J, in Thomas v. Thomas stated: “...consideration means something which is of

some value in the eye of the law.” This definition was used as binding precedent for

many years, and it can be said that Williams does not contradict this, but does in fact

agree with it. This is because now, the 'eye of the law' believes the practical, not the legal

benefit or detriment to be of value, and this is therefore an argument against Williams

creating ambiguity. (LAW_0143b)

The distribution of the most common verbs presented in Table 8 above is reflected to

some extent in the overall frequency of the three verb types used in the classification. Figure 4

shows the use percentage of the different types of verbs that follow this/these in the data.

Primary verbs (BE, HAVE and DO) account for nearly half of the occurrences (419), as could be

expected due to their functions as main and auxiliary verbs. Of the three primary verbs, BE is

the most predominantly used representing 89%. The second most common type comprises a

large variety of lexical verbs (289), most of which are infrequently used. While the auxiliary

modal verbs account for only 18% (157) of the total, five of them appear in the list of the ten

most frequently used verbs, as previously shown in Table 8.

Page 43: Elisabete Ferreira MA Thesis - DiVA portal

42

Figure 4. Percentage of types of verbs (percentages rounded to the nearest whole number)

4.4 Distribution of Attended and Unattended this/these across Disciplines

Table 9 shows the distribution of the demonstratives across the four disciplines represented in

the data. Neither individual nor combined figures for this and these indicate a clear distinction

between the so-called ‘hard’ sciences (Biological Sciences and Engineering) and ‘soft’ sciences

(Law and Linguistics) in terms of frequency of occurrence.9 There is, however, some

disciplinary variation in the preference for anaphoric demonstrative structures, which is more

evident when considering individual disciplines. While Engineering and Biological Sciences

show the highest combined frequencies per thousand words, they are followed very closely by

Linguistics. The most pronounced discrepancy is found between these three disciplines and

Law, which could suggest that the demonstratives this and these are less favoured by Law

students, who could be using other cohesive devices to establish anaphoric reference. Looking

at this and these separately, the same pattern is repeated for the plural form, despite a smaller

difference between Engineering and Biological Sciences, whereas this shows a slightly

9 Refer to Table 2 in Section 3.1 for details on the selected hard and soft disciplines according to their grouping in the BAWE corpus.

48

18

33

PRIMARY MODAL LEXICAL

Page 44: Elisabete Ferreira MA Thesis - DiVA portal

43

different pattern with Linguistics and Biological Sciences inverting positions as the second

highest number of instances, as well as a less marked discrepancy between Law and the other

disciplines.

The figures in Table 9 indicate a highly significant association (at the p < 0.0001 level)

between individual disciplines and the use of anaphoric demonstrative structures.10

Table 9. Raw and normalised frequencies (per 1,000 words) of this and these per discipline

thisa thesea this+these

Discipline Raw Normalised Raw Normalised Raw Normalised

Linguistics 360 8.82 96 2.35 456 11.17

Law 402 7.10 56 0.99 458 8.09

Biological Sciences 631 8.70 189 2.61 820 11.31

Engineering 640 10.87 157 2.67 797 13.54

2,033 498 2,531 a Chi-squared = 22.53 (df = 3); p-value = 0.00005063138 (p<0.0001); effect size measure: Cramér’s V = 0.0943

A comparison of the combined frequencies of attended and unattended this/these across

the four corpora also reveals some disciplinary variation. As can be seen in Figure 5 below, the

Biological Sciences show a clearer preference for using this/these as determiners rather than

pronouns. The difference is much less pronounced, however, for the other disciplines, whose

frequencies reflect the overall frequencies of occurrence that were presented in Section 4.1.

10 Statistical significance testing performed using the UCREL Significance Test System.

Page 45: Elisabete Ferreira MA Thesis - DiVA portal

44

Figure 5. Percentage of this/these as determiners and pronouns per discipline (percentages

rounded to the nearest whole number)

When considered individually, again this and these exhibit different patterns when used

as determiners or pronouns, as can be seen in Table 10, which provides the relative normalised

frequencies of (un)attended this/these by discipline. The singular demonstrative is attended by

a noun or left unattended in similar proportions, except in Biological Sciences, as illustrated

above in Figure 5. It appears that biology students tend to make reference to a previous

antecedent more explicit than students of the other three disciplines. In contrast, these is used

on average three times more often as a determiner than as a pronoun regardless of discipline.

Looking more closely at the frequencies for this, it is noteworthy that in two seemingly

opposite disciplines, Linguistics and Engineering, pronominal this is actually slightly more

frequent than the determiner form. A more detailed, functional analysis of the extended

environments (including antecedents) that co-occur with (un)attended this in these disciplines

would be needed, however, to determine whether this similarity in frequency is also matched

in terms of discourse functions.

55

53

64

55

45

47

36

45

L I N G U I S T I C S L A W B I O L O G I C A L S C I E N C E S E N G I N E E R I N G

Determiner Pronoun

Page 46: Elisabete Ferreira MA Thesis - DiVA portal

45

Table 10. Normalised frequencies (per 1,000 words) of attended and unattended this and these

per discipline, including results of statistical significance testing

thisa theseb this+thesec

Discipline Attended Unattended Attended Unattended Attended Unattended

Linguistics 4.36 4.46 1.79 0.56 6.15 5.02

Law 3.57 3.53 0.71 0.28 4.27 3.81

Biological

Sciences

5.12 3.59 2.08 0.52 7.20 4.11

Engineering 5.30 5.57 2.14 0.53 7.44 6.10 a Chi-squared = 15.76 (df = 3); p-value = 0.001270433 (p<0.01); effect size measure: Cramér’s V = 0.0880

b Chi-squared = 2.48, p-value = 0.4787066 (p<0.5); effect size measure: Cramér’s V = 0.0706

c Chi-squared = 20.02, p-value = 0.0001684888 (p<0.001); effect size measure: Cramér’s V = 0.0889

In general, the above results of the distribution of attended and unattended this/these point

to some cross-disciplinary variation, which is similar to the findings of Wulff et al. (2012) who

found some degree of variation, albeit not consistent, across disciplines in the frequencies of

this in advanced student writing. Their results, however, apply to final-year undergraduates and

graduate students, hence level of study could be an important explanatory factor. Wulff et al.

(2012) also add that “DISCIPLINE and LEVEL interact in quite intricate ways” (p. 141;

emphasis in original).

4.5 Distribution of Nouns/Verbs and Types across Disciplines

The distribution across disciplines generally reflects the overall findings for nouns and noun

types. Some differences are evident concerning the most frequently used nouns and noun types.

Table 11 shows the most common nouns in each of the four disciplines (see Appendix 4 for a

complete list); in bold the nouns that appear with this/these in at least three disciplines, and

underlined those that occur in two.

Page 47: Elisabete Ferreira MA Thesis - DiVA portal

46

Table 11. The most frequent noun lemmas following this/these, by discipline

Linguistics Law Biological Sciences Engineering

essay (11) case (8) study (8) feature (7) stage (7) variable (7) question (6) task (6) claim (5) idea (5) point (5) way (5)

case (32) approach (13) offence (11) essay (9) decision (7) rule (6) presumption (5) act (4) argument (4) requirement (4) right (4) theory (4) view (4)

experiment (22) theory (17) essay (16) organism (13) process (13) reaction (13) way (12) gene (11) method (9) point (8) temperature (8) bacterium (7) cell (7) enzyme (7) condition (6) region (6)

experiment (21) value (18) laboratory (16) model (16) data (13) point (13) report (13) case (11) result (10) investigation (7) method (6) problem (6) error (5) industry (5) number (5) way (5)

Note: The nouns that appear in at least three disciplines are shown in bold; those in two disciplines are underlined.

As previously pointed out in Section 4.2, the type of task or assignment seems to play a

role in the use of nominal structures, since the deictic noun essay is one of the most common

nouns in the data regardless of discipline. The other ones are case, which has different meanings

in different disciplines as reflected in its classification as both shell and other abstract nouns,

and point and way that are general nouns whose meaning is dependent on the specific context

they occur, mostly used as shell nouns for encapsulating information presented previously. The

following examples show typical uses of case in different disciplines; not surprisingly, in Law

case is predominantly used in the sense of ‘legal action’ as opposed to ‘an instance of a

particular situation’ in the examples from Linguistics and Engineering.

(16) His impact on contractual licensee's rights to occupation was finally negated by the

Court of Appeal in Ashburn Anstalt v. Arnold. This case necessitated a full review of the

status of contractual licences. (LAW_0069b)

Page 48: Elisabete Ferreira MA Thesis - DiVA portal

47

(17) In line 26 the word reaction precedes the phrase “set in”. Standing alone the word

'reaction' can be said to have either a negative or a positive prosody however in this case

once again it appears to have a negative prosody. (LING_6183a)

(18) The IPOD Oximeter module consists of a peripheral probe, with a microprocessor unit.

In this case, the peripheral probe contains two IR sensors, and the trace is displayed on a

computer screen. (ENG_0347e)

The remaining nouns are more discipline-specific or related to methodology, such as feature,

variable, question and task in Linguistics; offence, decision, act, and right in Law; organism,

reaction, gene and cell in Biological Sciences; laboratory, model, data, and report in

Engineering. This corresponds to the predominance of metadiscoursal and methodology-related

nouns found in Römer and Wulff’s (2010) case study on attended and unattended this in

advanced student writing (MICUSP).

The distribution of noun types in Figure 6 below shows a similar cross-disciplinary

variation to some degree, particularly in terms of shell and concrete nouns, whose frequencies

differ the most between the four disciplines. As might be expected, the highest frequencies of

concrete nouns are found in Biological Sciences and Engineering due to the focus of these

disciplines in physical entities or real world phenomena (Becher & Trowler, 2001, p. 36), while

Linguistics and Law show little or no use of this type of noun. In contrast, Law students favour

shell nouns (60%) to a larger extent when compared to the other disciplines and in particular to

Biological Sciences (33%) and Engineering (29%), where shell nouns are employed the least

amount of times. Deictic nouns are less used in Law (11%) and to a comparable degree in the

other disciplines, particularly in Linguistics (20%) and Engineering (21%).

The pervasive use of shell nouns in the data, in particular in the soft disciplines, and their

role for text cohesion would warrant a more detailed qualitative analysis in order to scrutinise

Page 49: Elisabete Ferreira MA Thesis - DiVA portal

48

whether they are used to the same purposes in different disciplines and whether that use

develops over time.

Figure 6. Percentage of types of nouns per discipline (percentages rounded to the nearest whole

number)

The following examples illustrate typical uses of the most distinctive noun types in the

soft disciplines versus the hard disciplines, shell nouns and concrete nouns respectively. Two

of the most frequent shell nouns in Law and Linguistics, approach and evidence, are seen in

examples (19) and (20), where the noun takes its meaning from the antecedent; examples (21)

and (22) show the concrete nouns organism and sensors in Biological Sciences and

Engineering.

(19) In Hine v. Hine, Lord Denning M.R. argued that the courts would have discretionary

use of this section, in allocating shares of such assets, in cases where the acquisition of

property was achieved by joint efforts (which he argued would include non-financial

contributions such as child-care). This approach, typical of Denning, was firmly rejected

by the House of Lords in Pettitt v. Pettitt. (LAW_0069b)

29

33

60

48

37

26

29

26

21

15

11

20

13

26

6

E N G I N E E R I N G

B I O L O G I C A L S C I E N C E S

L A W

L I N G U I S T I C S

SHELL OTHER ABSTRACT DEICTIC CONCRETE

Page 50: Elisabete Ferreira MA Thesis - DiVA portal

49

(20) The chimpanzees also seem to possess the ability to understand new combinations of

signs that they have not come across before and to express themselves spontaneously i.e.

without waiting for a stimulus which provokes them to respond. This evidence suggests

that the chimps did indeed develop a grasp of simple language. (LING_6067b)

(21) The most mitochondrial-like bacterial genome is that of Rickettsia prowazekii, the cause

of louse borne typhus. This organism has a genome of more than a million bp in size and

contains 834 protein encoding genes. (BIO_0032a)

(22) An extensive system would have a lot of sensors placed around the home. These sensors

would give information to the processing unit where the situation of the elderly person

would be determined. (ENG_0228f)

Overall, these results show variation to a larger extent when compared to the similar

percentages of noun types in Applied Linguistics and Engineering research articles reported by

Gray and Cortes (2011). It remains unclear to what degree this difference reflects the influence

of discipline or genre, since undergraduate student writing differs considerably from research

article writing in terms of purpose.

In terms of the most common verbs following this/these, no salient cross-disciplinary

differences are found. As can be seen in Table 12 (see Appendix 5 for a complete list), the most

frequent verb in the four corpora is BE, followed by other auxiliary verbs DO, may and would.

The only lexical verb that appears in all disciplines is MEAN, confirming its importance in

providing explanations or interpretations in student writing (cf. Wulff et al., 2012). There are

also a number of overlapping verbs that appear in most of the corpora, which points to a broad

similarity across the disciplines. In fact, only a few verbs occur more often with this/these in

one discipline only: LINK, PROVIDE, INCLUDE, PRODUCE. None of these, however, can be said to

be particularly characteristic of the discipline in which it occurs. Gray and Cortes (2001) also

found that only a few verbs in their data seem to be discipline-specific, further suggesting that

Page 51: Elisabete Ferreira MA Thesis - DiVA portal

50

the majority of the verbs are instead genre-specific “in that they represent processes and states

typical in research articles but do not require prior knowledge of the discipline in order to be

comprehended” (p. 38). The results shown in Table 12, however, seem to indicate that those

are not only genre- but register-specific verbs since they appear quite frequently in academic

writing in general (cf. Schutz, 2013), including student writing, rather than being specific to

research articles.

Table 12. The most frequent verb lemmas following this/these, by discipline

Linguistics Law Biological Sciences Engineering

BE (75)

SUGGEST (9)

can (7)

DO (7)

SHOW (6)

may (5)

would (5)

could (4)

HAVE (4)

LINK (4)

MEAN (4)

BE (79)

HAVE (8)

MEAN (7)

would (7)

DO (5)

SUGGEST (5)

may (4)

could (3)

LEAD TO (3)

will (3)

BE (115)

can (14)

MEAN (13)

would (11)

may (8)

OCCUR (7)

PROVIDE (5)

will (5)

ALLOW (4)

CAUSE (4)

DO (4)

INCLUDE (4)

LEAD TO (4)

SHOW (4)

SUGGEST (4)

BE (104)

would (25)

can (16)

MEAN (14)

may (13)

could (11)

HAVE (11)

CAUSE (8)

will (8)

ALLOW (6)

SHOW (6)

DO (5)

OCCUR (4)

PRODUCE (4)

Note: Small caps identify lemmas. The verbs that appear in all disciplines are shown in bold; those in only one

discipline are underlined.

The distribution of verb types displayed in Figure 7 also shows minor variation, again

mostly between the hard and soft disciplines, with the figures for Linguistics and Law matching

almost entirely and those for Engineering and Biological Sciences differing only slightly in the

proportion of primary versus modal verbs. This suggests that, in contrast to the results for nouns

Page 52: Elisabete Ferreira MA Thesis - DiVA portal

51

and noun types reported on above, novice student writing does not appear to reflect as much

disciplinary specificity with regards to the verb types used in cohesive structures with this/these.

Figure 7. Percentage of types of verbs per discipline (percentages rounded to the nearest whole

number)

Looking specifically at how the most frequent primary verb (BE) and the two most

common lexical verbs overall (MEAN and SUGGEST) are distributed across the disciplines, some

interesting patterns emerge. A search for this + BE reveals that this structure is used slightly

more in the hard disciplines (59%) than in the soft disciplines (41%). In contrast, the structure

this + MEAN/SUGGEST + that performs the previously commented on explanatory function more

predominantly in Biological Sciences and Engineering texts with the phrase this means that (21

out of 24 instances), and in Linguistics and Law mainly through this suggests that (11 out of 14

instances). These frequencies seem to indicate distinct verb preferences between the hard and

soft sciences that could reflect the way knowledge is constructed differently, based on evidence

or tentative interpretation, respectively. It could be argued, however, that genre also plays an

57 5847

40

14 13

16 25

30 3036 35

L I N G U I S T I C S L A W B I O L O G I C A L S C I E N C E S E N G I N E E R I N G

PRIMARY MODAL LEXICAL

Page 53: Elisabete Ferreira MA Thesis - DiVA portal

52

important role in this distribution since a large proportion of texts in the Biological Sciences

and Engineering corpora belong to the Explanation genre family (Gardner & Nesi, 2013).

5 Conclusion

The present study has investigated the extent to which the anaphoric demonstratives this and

these are used as determiners (attended) or pronouns (unattended) by first-year undergraduate

students from four different academic disciplines, as well as analysed the nominal and verbal

structures that follow the demonstratives. Based on data extracted from the BAWE corpus, the

following five research questions were answered:

1) To what degree are attended and unattended this/these used in first-year undergraduate

student writing?

The demonstratives this and these are used to varying degrees as determiners and pronouns in

the data. While these occurs predominantly together with an attendant noun, this is almost

evenly followed by a noun or a verb.

2) Which nouns and noun types (concrete, deictic, shell, other abstract nouns) most

frequently follow attended this/these?

Overall, this and these are predominantly attended by methodology- or results-related nouns

(e.g. experiment, method, result) and nouns that relate to student activity or type of task (e.g.

essay, report), as well as other more general nouns (e.g. case, way) that occur mostly in ‘semi-

fixed’ expressions with an anaphoric function such as “in this case” and “in this way”. When

considered individually, this and these mainly pattern with different nouns. With respect to the

types of nouns, shell and other abstract nouns occur predominantly in the data, followed by

Page 54: Elisabete Ferreira MA Thesis - DiVA portal

53

deictic and concrete nouns, as might be expected given that academic discourse tends to be

abstract, and includes a great deal of metadiscoursal references.

3) Which verbs and verb types (lexical, primary, modal) most frequently follow unattended

this/these?

The verb lemma BE accounts for nearly half the occurrences of verbs in the data, and is widely

used in seemingly ‘prefabricated’, fixed structures such as this + BE to present a reason for a

previous point. A great variety of lexical verbs can be found, but only a small number of them

occur frequently, including MEAN, SUGGEST and SHOW, which seem to be used to provide an

explanation or interpretation of the findings, in constructing arguments. The distribution of verb

types reflects closely that of the most common verbs, with primary verbs occurring most

frequently (48%), followed by lexical verbs (33%) and modal verbs (18%).

4) To what extent, if any, do student writers in different disciplines differ with respect to

the frequency of use of attended and unattended this/these?

Some disciplinary variation can be found in the use of the demonstratives, with Law students

employing these cohesive devices to a smaller extent than students of the other three disciplines.

The percentages of (un)attended this/these also show some variation between the disciplines,

namely a clearer preference in Biological Sciences for using this/these with an attendant noun

rather than unattended. The individual distributions for this and these indicate different patterns

for the demonstratives across disciplines. The demonstrative this is used to a similar extent as

a determiner and pronoun in Law, Linguistics and Engineering, and more often as a determiner

in Biological Sciences, while these is predominantly used as a determiner in all disciplines.

5) How are the most frequently co-occurring nouns/verbs and respective types distributed

across academic disciplines?

Page 55: Elisabete Ferreira MA Thesis - DiVA portal

54

Cross-disciplinary variation is more salient in the distribution of nouns/noun types than of

verbs/verb types. While three of the most frequently used nouns appear in all four disciplines,

the majority is either related to methodology or more discipline-specific. In terms of types of

nouns, there is a striking predominance of shell nouns in Law which contrasts with their more

sparse use in the Biological Sciences and Engineering. The results for verbs and types of verbs,

on the other hand, show minor disciplinary variation, with a large number of overlapping verbs.

The main difference is found in the distribution of verb types between ‘soft’ and ‘hard’ sciences.

The findings of this study stand in contrast to the prescriptivist views on unattended this

as an unwanted phenomenon that should be corrected in students’ writing. Native English first-

year undergraduates, as represented by the BAWE corpus data, do use this/these as pronouns

to a large extent in their assignments, although with varying degrees of frequency in different

disciplines. When compared to previous studies (e.g. Wulff et al., 2012; Petch-Tyson, 2000;

Römer & Wulff, 2010), some possible explanations for these results were proposed that relate

to the educational context and level of study in association with the range of lexical choices

available to novice student writers.

Some differences were also observed in the distribution of nouns, verbs and respective

types across disciplines, which may be explained by factors other than discipline. Given the

predominance of different genres in the different subcorpora (essay and critique in Linguistics

and Law; methodology recount and explanation in the Biological Sciences and Engineering),

the influence of genre in the results needs to be taken into account. Future studies that are able

to tease apart these interrelated variables could provide insightful findings.

The size of the data sets is another limitation of the present study that prevents making

any generalisable observations. Further research using larger corpora could also extend this

study by comparing the use of the demonstratives between levels of study (early to advanced

undergraduates) or expertise (student versus professional academic writing), which could help

Page 56: Elisabete Ferreira MA Thesis - DiVA portal

55

reveal possible patterns of development in cohesive writing using anaphoric demonstratives.

Likewise, an investigation of (un)attended this/these in academic writing by non-native English

speaking novice writers would complement the findings of the present study. The role of the

anaphoric demonstratives this/these in different communicative contexts, as well as a

comparison with their plural counterparts that/those, could also be examined in future studies.

By establishing what is done in actual learning contexts, better-informed EAP writing

instruction and reference materials can be developed in line with current practices (Hunston,

2002; Nesi, 2016). The results of this empirical study on anaphoric (un)attended this/these as

important cohesive devices in successful academic writing by first-year undergraduates could

be of value to both teachers and students. The fact that unattended this in particular seems to be

such a widespread phenomenon in novice student writing merits further attention by EAP

practitioners and researchers. It is important to draw attention to all the choices available to

students when selecting resources that can enhance text cohesion while balancing clarity and

economy of expression.

Page 57: Elisabete Ferreira MA Thesis - DiVA portal

56

References

American Psychological Association. (2010). Publication manual of the American

Psychological Association (6th ed.). Washington, DC: American Psychological

Association.

Anthony, L. (2019). AntConc (Version 3.5.8) [Computer Software]. Tokyo, Japan: Waseda

University. Available from http://www.laurenceanthony.net/software

Becher, T., & Trowler, P. (2001). Academic Tribes and Territories: Intellectual Enquiry and

the Culture of Disciplines. London: The Society for Research into Higher Education &

Open University Press.

Bhatia, V. K. (2002). A Generic View of Academic Discourse. In J. Flowerdew (Ed.), Academic

Discourse (pp. 21-39). London: Longman.

Biber, D. (2006). University language: A corpus-based study of spoken and written registers.

Amsterdam: John Benjamins.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman grammar

of spoken and written English. Harlow: Pearson Education.

Biber, D., & Gray, B. (2010). Challenging stereotypes about academic writing: Complexity,

elaboration, explicitness. Journal of English for Academic Purposes, 9(1), 2–20.

Bloor, T., & Bloor, M. (2004). The Functional Analysis of English: A Hallidayan Approach.

London: Arnold.

Brown, G., & Yule, G. (1983). Discourse analysis. Cambridge: Cambridge University Press.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly 34(2), 213–238. Charles, M. (2011). Adverbials of result: Phraseology and functions in the Problem–Solution

pattern, Journal of English for Academic Purposes, 10(1), 47–60.

Flowerdew, J. (2016). English for Specific Academic Purposes (ESAP) writing: Making the

case. Writing and Pedagogy, 8, 1–32.

Page 58: Elisabete Ferreira MA Thesis - DiVA portal

57

Flowerdew, J. (2002). Introduction: Approaches to the Analysis of Academic Discourse in

English. In J. Flowerdew (Ed.), Academic Discourse (pp. 1-17). London: Longman.

Gardner, D., & Davies, M. (2013). A New Academic Vocabulary List, Applied Linguistics,

35(3), 305–327.

Gardner, S., & Nesi, H. (2013). A classification of genre families in university student writing.

Applied Linguistics, 34(1), 25–52.

Gardner, S., Nesi, H., & Biber, D. (2018). Discipline, level, genre: Integrating situational

perspectives in a new MD analysis of university student writing. Applied Linguistics.

(Advance article)

Geisler, C., Kaufer, D. S., & Steinberg, E. R. (1985). The unattended anaphoric “this”: When

should writers use it? Written Communication, 2(2), 129–155.

Glenn, C., & Gray, L. (2013). The Writer’s Harbrace Handbook. International Edition (5th ed.).

Boston, MA: Wadsworth Cengage Learning.

Granger, S., & Tyson, S. (1996). Connector Usage inthe English Essay Writing of Native and

Non-Native EFL Speakers of English. World Englishes, 15/1, 17–27.

Gray, B. (2010). On the use of demonstrative pronouns and determiners as cohesive devices: A

focus on sentence-initial this/these in academic prose. Journal of English for Academic

Purposes, 9(3), 167–183.

Gray, B. & Cortes, V. (2011). Perception vs. evidence: An analysis of this and these in academic

prose. English for Specific Purposes, 30(1), 31–43.

Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.

Hinkel, E. (2001). Matters of cohesion in L2 academic texts. Applied Language Learning,

12(2), 111–132.

Hunston, S. (2006). Corpus Linguistics. In K. Brown (Ed.), Encyclopedia of Language &

Linguistics (2nd ed.). 234–248.

Page 59: Elisabete Ferreira MA Thesis - DiVA portal

58

Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.

Hyland, K. (2009). Academic discourse: English in a global context. London: Continuum.

Hyland, K. 2008. As can be seen: Lexical bundles and disciplinary variation. English for

Specific Purposes, 27(1), 4–21.

Hyland, K. (2006). Disciplinary differences: Language variation in academic discourses. In K.

Hyland & M. Bondi (Eds.), Academic discourse across disciplines (pp. 17-45). Frankfurt:

Peter Lang.

Hyland, K. (2004). Disciplinary discourses: social interactions in academic writing. Ann

Arbor, MI: University of Michigan Press.

Hyland, K. (2002). Specificity revisited: how far should we go now?, English for Specific

Purposes, 21(4): 385–395.

Hyland, K., & Jiang, F. (2017). Is academic writing becoming more formal? English for Specific

Purposes, 45, 40–51.

Hyland, K., & Shaw, P. (2016). Introduction. In K. Hyland & P. Shaw (Eds.), The Routledge

Handbook of English for Academic Purposes (pp. 1–14). London/New York: Routledge.

Hyland, K., & Tse, P. (2007). Is there an “academic vocabulary”? TESOL Quarterly, 41, 235–

253.

Lee, J. J., Bychkovska, T., & Maxwell, J. D. (2019). Breaking the rules? A corpus-based

comparison of informal features in L1 and L2 undergraduate student writing. System,

80, 143–153.

McCarthy, M., Matthiessen, C., & Slade, D. (2010). Discourse Analysis. In N. Schmitt (Ed.),

An Introduction to Applied Linguistics (pp. 53-69). London: Arnold Publishers.

Moskovit, L. (1983). When is broad reference clear? College Composition and Communication,

34(4), 454–469.

Page 60: Elisabete Ferreira MA Thesis - DiVA portal

59

Nesi, H. (2016). Corpus studies in EAP. In K. Hyland & P. Shaw (Eds.), The Routledge

Handbook of English for Academic Purposes (pp. 206-217). London/New York:

Routledge.

Nesi, H., & Gardner, S. (2012). Genres across the disciplines: Student writing in higher

education. Cambridge, UK: Cambridge University Press.

Paltridge, B. (2012). Discourse analysis: An introduction. London: Continuum.

Petch-Tyson, S. (2000). Demonstrative expressions in argumentative discourse: A computer

corpus-based comparison of non-native and native English. In S. Botley & A. M.

McEnery (Eds.), Corpus-based and computational approaches to discourse anaphora

(pp. 43-64). Amsterdam: John Benjamins.

Römer, U., & Wulff, S. (2010). Applying corpus methods to writing research: Explorations of

MICUSP. Journal of Writing Research 2(2). 99–127.

Schutz, N. (2013). How Specific is English for Academic Purposes? A look at verbs in business,

linguistics and medical research articles. In G. Andersen & K. Bech (Eds.), English

Corpus Linguistics: Variation in Time, Space and Genre (pp. 237-257). Amsterdam:

Rodopi Publishers.

Simpson-Vlach, R., & Ellis, N. (2010). An academic formulas list: New methods in

phraseology research. Applied Linguistics 31(4), 487–512.

Staples, S., Egbert, J., Biber, D., & Gray, B. (2016). Academic Writing Development at the

University Level: Phrasal and Clausal Complexity Across Level of Study, Discipline, and

Genre. Written Communication, 33(2), 149–183.

Swales, J. (2005). Attended and unattended “this” in academic writing: A long and unfinished

story. ESP Malaysia, 11(1), 1–15.

Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge:

Cambridge University Press.

Page 61: Elisabete Ferreira MA Thesis - DiVA portal

60

Swales, J., & Feak, C. (2012). Academic Writing for Graduate Students: Essential Tasks and

Skills (3rd ed.). Ann Arbor, MI: University of Michigan Press.

Wulff, S., Römer, U., & Swales, J. (2012). Attended/unattended this in academic student

writing: Quantitative and qualitative perspectives. Corpus Linguistics & Linguistic

Theory, 8, 129–157.

Page 62: Elisabete Ferreira MA Thesis - DiVA portal

61

Appendix 1

Complete list of head nouns (lemmas) following this/these in the data (480 types / 1423 tokens).

Rank Noun Frequency 1 2 3 4 5 6 7 8 8 8 8

12 13 13 13 13 17 17 17 20 20 20 20 24 24 24 27 28 29 29 29 29 29 29 29 29 29 29 39

case experiment essay point way theory value method model process laboratory result data investigation reaction report approach organism stage area idea question time gene offence type feature rule cell error graph market problem region study task temperature variable bacterium

56 47 37 29 25 24 20 17 17 17 16 16 14 14 14 14 13 13 13 12 12 12 12 11 11 11 10 9 8 8 8 8 8 8 8 8 8 8 7

Page 63: Elisabete Ferreira MA Thesis - DiVA portal

62

39 39 39 39 39 39 39 39 48 48 48 48 48 48 48 48 48 57 57 57 57 57 57 57 57 57 57 57 57 57 57 71 71 71 71 71 71 71 71 71 71 71 71

behaviour decision difference enzyme procedure table technique view claim condition evidence knowledge number part period phrase requirement argument code discussion figure hypothesis industry issue people phase presumption section situation stain word act association assumption change concept conclusion definition device fact form group information

7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4

Page 64: Elisabete Ferreira MA Thesis - DiVA portal

63

71 71 71 71 71 71 71 71 71 71 71 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94 94

level lifecycle module organelle page phenomenon project property right sequence structure ability adaptation antibiotic application arrangement article aspect barrier boundary calculation characteristic company cost criticism design disease effect element example explanation factor finding formula fusion image instance line load material molecule notion paper

4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

Page 65: Elisabete Ferreira MA Thesis - DiVA portal

64

94 94 94 94 94 94 94 94 94 94 94 94 94

140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140

pathogen plant practical reason reptile research respect sort sound strategy technology threat transfer beetle DNA action amoeba animal assignment bird bond book bottle car chain charge chimpanzee choice compartment compound consideration context demand development diagram distinction equation exercise field fraction frame framework function

3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Page 66: Elisabete Ferreira MA Thesis - DiVA portal

65

140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 140 209 209

genome implant increase interaction interest intron kind lab lion machine measurement mechanism motor need optimum piece plate position product protein provision rate relationship sample scenario sense sentence set sign signal software song source state statement subject system thickness undulipodia vehicle year acquisition addition

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1

Page 67: Elisabete Ferreira MA Thesis - DiVA portal

66

209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209

advancement advert advertisement aerobe agreement aim alteration alternative amount analogy analysis anticipation apparatus appendix approximation array attachment attempt background balance basis benefit block blockage breakthrough building capsule carrier cassette catalysis categorisation channel children circumstance clumping coil collection colony combination commonplace comparison competency component

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 68: Elisabete Ferreira MA Thesis - DiVA portal

67

209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209

composition conformation constant construction containment convention conviction cooperation corpus cover craftmanship creature crime crista crossing cursor cycle danger death defect defence degradation density description detail dialect difficulty diffusion digression dilution direction disc discovery disorder distortion division doctrine dome drink drum duty electron end

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 69: Elisabete Ferreira MA Thesis - DiVA portal

68

209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209

energy enhancement equity era establishment event evolution examination excitement experience expression extension fear female firm fluid foetus force format foundation freedom gesture gland graphology grounding grouping growth habit histone honour hormone hunk imagery imbalance imposition imprint inaccuracy incident inconsistency individual infection informant infusion

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 70: Elisabete Ferreira MA Thesis - DiVA portal

69

209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209

interview investment item jurisdiction justification law licence linguist link locality male manipulation manufacturer measure member membrane methodology mist mixture mode moment movement multiplicity muscle mutation nap nature negation nest objection opinion outbreak overview oxaloacetate pH pair papule paradox parasite particle patch patient pattern

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 71: Elisabete Ferreira MA Thesis - DiVA portal

70

209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209

percentage perspective pesticide photograph picture pilus plot polymer polynomial polypeptide positioning possibility precaution precautions preference prevention pride principle promoter proposal protection quanta quantity quote radius ratio rationalisation reasoning receiver receptor recycling rejection reliance repressor residency resistance respirator response responsibility restriction revelation reversion risk

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 72: Elisabete Ferreira MA Thesis - DiVA portal

71

209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209

robot route routine scale scene search sensitivity sensor separateness setting shape shift size skill society solution species speculation speech spike split stance standard step stock story strain strength style substance subunit success sugar suggestion summary symbol terminal thesis thylakoid titre tradition transition trend

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 73: Elisabete Ferreira MA Thesis - DiVA portal

72

209 209 209 209 209 209 209 209 209 209 209

tube unit use utterance variation version vocalisation voltage waiting weight work

1 1 1 1 1 1 1 1 1 1 1

Page 74: Elisabete Ferreira MA Thesis - DiVA portal

73

Appendix 2 Complete list of head nouns classified as shell, other abstract, deictic, and concrete.

SHELL ABSTRACT DEICTIC CONCRETE ability action adaptation advancement alteration analysis approach area arrangement aspect association assumption behaviour benefit boundary calculation case categorisation change characteristic claim combination comparison composition condition conformation construction cooperation criticism decision definition demand development diffusion discovery division doctrine enhancement error establishment evidence example experience explanation fact

ability act approach area barrier blockage bond breakthrough case chain choice clumping code company consideration conviction cursor data death density design difference DNA energy equity error evolution feature form formula graph graphology idea inaccuracy industry infusion investment jurisdiction knowledge law licence lifecycle load mechanism model

data diagram essay experiment figure graph image interview investigation lab laboratory line overview page paper photograph phrase picture point practical project question quote report result section sentence stain study table task

antibiotic apparatus bacterium beetle capsule cassette cell colony compartment compound crista device dilution disc dome drum enzyme female fluid frame gene hormone implant implant individual male manufacturer material member membrane mist mixture molecule motor muscle nest organism oxaloacetate papule parasite plant protein reptile sensor species

Page 75: Elisabete Ferreira MA Thesis - DiVA portal

74

feature finding fraction habit honour hypothesis idea imbalance inconsistency information issue justification line measure method model need offence pattern phenomenon point precaution problem procedure property question rationalisation reaction reasoning region relationship requirement responsibility restriction result revelation separateness sequence situation speech stage story strategy strength task technique theory time tradition transfer

module mutation negation notion number number objection page percentage phrase polynomial presumption quanta quantity question radius rate reaction recycling region relationship repressor residency scale sensitivity sequence set shape size song strategy structure study system theory transfer type value vocalisation voltage weight

spike stain sugar terminal thylakoid undulipodia

Page 76: Elisabete Ferreira MA Thesis - DiVA portal

75

use value view way

Page 77: Elisabete Ferreira MA Thesis - DiVA portal

76

Appendix 3

Complete list of verbs (lemmas) following this/these in the data (111 types / 865 tokens).

Rank Verb Frequency 1 2 3 4 5 6 7 7 9

10 10 12 13 13 15 16 17 18 18 18 21 21 21 21 25 25 25 25 25 30 30 30 30 30 35 35 35 35 35

be would can mean may have could do suggest show will cause allow occur lead to include provide involve produce result create increase indicate seem demonstrate give highlight link reflect affect bring contain prevent use add apply go help might

373 48 39 38 30 25 21 21 20 16 16 14 11 11 9 8 7 6 6 6 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2

Page 78: Elisabete Ferreira MA Thesis - DiVA portal

77

35 35 35 35 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44

need prove stop work address affirm assign assist assume attribute build bypass change combine concentrate contend cover decrease depend deter dilute disadvantage display drive eliminate enable end up enforce equate exclude exist explain extend fail fall forego generate happen hold illustrate induce lack limit

2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 79: Elisabete Ferreira MA Thesis - DiVA portal

78

44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44

look lower make mix must offer open place play pose preclude present refer remove render revise rouse run save secrete send set simplify start strip tend undermine verge

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 80: Elisabete Ferreira MA Thesis - DiVA portal

79

Appendix 4

Alphabetically ordered list of noun lemmas per discipline (types/tokens).

Biological Sciences (221/501)

Engineering (187/434)

Law (113/240)

Linguistics (130/251)

ability acquisition action adaptation addition aerobe amoeba animal antibiotic area arrangement array aspect association assumption attachment bacterium balance behaviour benefit bird block blockage bond capsule carrier case catalysis cell chain change characteristic charge claim clumping collection colony compartment compound concept conclusion condition conformation

ability adaptation advancement alteration analogy apparatus appendix application approximation area argument arrangement assignment assumption barrier beetle behaviour book bottle boundary breakthrough building calculation car case cassette cell change code coil company comparison competency component conclusion cost craftmanship crossing cursor data defect demand density

act agreement aim alternative analysis approach area argument article assumption attempt basis case categorisation charge choice circumstance commonplace concept conclusion consideration construction context convention conviction cost crime criticism decision defence definition degradation discussion distinction doctrine duty effect equity essay establishment examination example fact

ability action advert advertisement amount anticipation area article assignment association background behaviour boundary case change channel children chimpanzee choice claim code combination composition concept context corpus definition demand description development dialect difference difficulty digression discussion disorder distortion drink error essay evidence example excitement

Page 81: Elisabete Ferreira MA Thesis - DiVA portal

80

constant containment cooperation cover creature crista cycle danger data death dilution discovery discussion disease division dna electron element end energy enhancement enzyme error essay event evidence evolution exercise experiment extension fact factor feature female figure finding foetus form fraction fusion gene genome gland graph group growth histone hormone hypothesis idea

design detail development device diagram difference diffusion direction disc dome drum effect element equation era error essay exercise experience experiment explanation expression fear feature figure firm fluid force formula frame freedom graph grounding group grouping habit honour idea image implant inaccuracy industry information infusion instance investigation investment issue kind knowledge

foundation frame idea imposition incident inconsistency increase instance interest investigation issue jurisdiction justification knowledge licence line measure method nature negation notion offence page period perspective phrase piece point position precaution preference presumption principle procedure proposal protection provision question rationalisation reason reasoning rejection requirement respect responsibility restriction revelation right rule scenario

experiment explanation fact factor feature field finding form format framework function gesture graphology hunk idea image imagery informant instance interview investigation issue item knowledge level line linguist link manipulation method mode multiplicity need notion paper paradox part patient people period phase phenomenon phrase piece point process question receiver research result

Page 82: Elisabete Ferreira MA Thesis - DiVA portal

81

imbalance imprint increase individual infection information interaction intron investigation lifecycle lion male material mechanism membrane method model molecule moment movement mutation need nest number opinion optimum organelle organism outbreak oxaloacetate page paper papule parasite part particle patch pathogen pattern period pesticide ph phase phenomenon photograph pilus plant plate plot point

lab laboratory law level load locality machine manufacturer market material measurement member method methodology mist mixture model module motor muscle nap number objection overview page pair part people percentage period phenomenon picture point polynomial positioning problem procedure process product project property question quote radius rate ratio reaction reason region relationship

section sense sentence shift situation society stance standard statement subject theory thesis threat time transition type use view way word

robot rule scale scene search sense sentence sequence set sign skill sort sound speech stage story structure study style subject success symbol task technique theory time two type unit utterance variable variation view waiting way word work

Page 83: Elisabete Ferreira MA Thesis - DiVA portal

82

polymer polypeptide position possibility practical precautions prevention pride problem procedure process product promoter property protein quanta quantity rate reaction reason receptor recycling region relationship reliance report repressor reptile residency respect result reversion route sample scenario section sensitivity separateness sequence size solution song source species speculation spike split stage stain step

report requirement resistance respect respirator response result risk routine rule section sensor set setting shape signal situation software sort source stage state strain strategy strength structure summary system table task technology terminal theory time type value variable vehicle view voltage way weight word year

Page 84: Elisabete Ferreira MA Thesis - DiVA portal

83

stock strategy structure substance subunit sugar suggestion system table technique technology temperature theory thickness thylakoid time titre tradition transfer trend tube type undulipodia value version view vocalisation way

Page 85: Elisabete Ferreira MA Thesis - DiVA portal

84

Appendix 5

Alphabetically ordered list of verb lemmas per discipline (types/tokens).

Biological Sciences (49/256)

Engineering (60/298)

Law (40/160)

Linguistics (29/152)

affect allow be bring bypass can cause change contain could create demonstrate deter dilute disadvantage do drive end exist explain generate have highlight include increase indicate induce involve lead to may mean occur play pose prevent produce provide refer reflect result secrete show start

add allow assign assist assume be can cause combine concentrate could cover create decrease demonstrate depend display do eliminate enforce equate exclude give happen have help highlight include increase indicate involve look lower make may mean might mix need occur offer open place

add address affirm allow apply attribute be bring build can cause contend could create do extend fail fall give go have hold increase indicate involve lead to may mean must preclude present reflect remove revise rouse seem suggest undermine will would

affect be can could demonstrate do enable forego have help highlight illustrate include increase indicate lack lead to limit link may mean need reflect run seem show suggest tend would

Page 86: Elisabete Ferreira MA Thesis - DiVA portal

85

stop suggest up verge will would

produce prove provide render result save seem send set show simplify strip suggest use will work would