example of an introduction

Upload: marisa-hoffmann

Post on 07-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Example of an Introduction

    1/10

    1.Introduction

  • 8/6/2019 Example of an Introduction

    2/10

    1.1 Introduction

    Although learner corpora is only a recent development within the areaof Second Language Acquisition (SLA henceforward) (Granger, Hungand Petch-Tyson, 2002: 5), there are still a fair number of differentlearner corpora available nowadays (Nesselhauf, 2004: 129).

    Learner corpora came about as a result of interest expressed bythose within the field of second language teaching as well as byresearchers interested in studying SLA. Through access to learnercorpora, it became possible to carry out systematic research ofseveral aspects in relation to language learning, such as establishingthe types of errors frequently committed by language learners(Barlow, 1996; similarly Conrad, 2004, as cited in Campoy-Cubillo etal, 2010). This type of information is especially helpful for foreign

    language teachers, for example, who are thereby made aware of thedifficulties that their students may be having with the foreignlanguage.

    Much of the learner corpora currently available for researchpurposes are arguably still in the initial stages of analysis based onthe fact that some are yet to be annotated so that the data can bedescribed in more detail, linguistically speaking. An example is theuse ofparts of speech (POS henceforward) annotation, considered toenrich a learner corpus (McEnery and Wilson, 2001: 32, as cited invan Rooy and Schfer, 2002: 325) because it provides additionalinformation to the researcher that is otherwise not explicit on the

    surface (Leech, 1997). In addition to POS, with regards to annotation,and, more importantly, in terms of the aims of this investigation,many learner corpora remain to be annotated for learner errors. Ofthose learner corpora that have been error annotated, however,various error annotation schemes have been applied to some of thedifferent learner corpora (see 2.4) available and each of theseannotation schemes focuses primarily on grammatical and lexicalerrors that are found in learner language.

    The study within learner corpora of what has been looselytermed as unnatural language is yet to be investigated in depth.However, one study of particular relevance to this area has beenconducted using a learner corpus (NICT JLE) in which three differingdegrees of error gravity were established to cater for cases of both

  • 8/6/2019 Example of an Introduction

    3/10

    varying degrees of erroneous language and cases of unnaturallanguage (Izumi, Uchimoto and Isahara, 2005). The degree of errorgravity was decided by asking a native English speaker to assign theunits of language to either one of two categories, each correspondingto a higher or lower degree of error gravity. An additional third

    category was also offered to the native speaker, and was classed asunnaturalness, thereby including elements of language that soundedunnatural regardless of the fact that they were correct grammaticallyspeaking.

    Similarly to Izumi, Uchimoto and Isaharas (2005) study, althoughin the area of written as opposed to spoken language, thisdissertation considers the issue of unnatural language use. Instancesof erroneous language are first uncovered however, in order tothereby highlight instances of unnatural language based on adistinction between the two types of so-called error types. To date,two aspects are pending with regards to the study of learnerlanguage. Firstly, the identification of categories to describe cases ofunnatural language are yet to be uncovered and furthermore, thecreation of an annotation scheme to recover such cases remains to beseen in learner corpora.

    This chapter sets out the objectives of this dissertation (1.2),describes the sample used in this investigation (1.3) and mostimportantly explains the need for a new taxonomy to classify cases ofunnatural language (i.e. language that would in all likelihood not beused by a native speaker) in a learner corpus (1.4).

    1.2 Objectives

    The objectives of this dissertation can be divided into general andspecific as follows:

    The general objectives are to:

    (i) Carry out an analysis of a set of academic texts written bySpanish university students of English as a second language,from the NOCE (NOn-native Corpus of English)learner corpus.

    (ii) Investigate the ideas put forward to date, regarding what

    constitutes an error according to the literature, with referenceto the concepts of grammaticality, acceptability andnaturalness.

    (iii) Distinguish between cases of erroneous language andunnaturalness in units of language taken from a learner corpus.

    The specific objectives are to:

    (i) Use the NOCE corpus to establish cases where a student has

  • 8/6/2019 Example of an Introduction

    4/10

    undoubtedly committed an error, whether comprehension isimpeded or not.

    (ii) Apply the concepts of grammaticality, acceptability andunnaturalness as discussed in the literature to the examples

    extracted from the selection of texts used from the NOCEcorpus.

    (iii) Create a taxonomy that complements the existing errorannotation scheme used for the NOCE corpus, allowing forpatterns of unnatural language in English to also be brought tothe forefront.

    (iv)Identify areas where further research needs to be done in thefield of unnatural language.

    With regards to the typographical conventions used in thisdissertation, italics are used for emphasis or where reference is madeto learner language corpora (e.g. NOCE). Bold type highlights aparticular element within an example (e.g. I washed my teeth).Finally, the APA guidelines were followed for the purposes of ensuringconsistency with regards to the general format of this dissertation.

    1.3 Materials and methods

    1.3.1The corpus

    This dissertation is based on the analysis of a section of the NOCElearner corpus which was compiled over a period of six academicyears (2003-2009). The corpus comprises written texts by students ofEnglish as a second language at the University of Jan and theUniversity of Granada, in Spain. The corpus was collected as part ofthe research activities involved in the research project PO7-HUM-03028 ( Junta de Andaluca) and to date comprises a total of 1051

    texts and a word count of over 300,000 words.Written texts were collected at three different stages of the

    academic year (October, February and June) and this procedure wasrepeated with each new intake of students at both universities. Ateach stage of the academic year, students were required to write acomposition about either one of three argumentative topics or abouta subject of their choice, classified as free writing. The range of topicsoffered to students were selected based on what was thought to be ofinterest to the students at that particular time. The topics at eachpoint of data collection are summarised as follows:

    Sample 1 (collected in October)

  • 8/6/2019 Example of an Introduction

    5/10

    1) The importance of foreign languages nowadays.

    2) Getting economically independent: Pros and cons.

    3) The internet in society: Progress or regress?

    10) Free writing.

    Sample 2 (collected in February/March)

    4) A destination for a one week summer holiday: Santo DomingoSeaside Resort or Bohemian and Monumental Paris. Givereasons for your choice.

    5) Justify your position in favour or against the following: Its notjust that were slaves of mobile phones, but on top of that, theymay be harmful to health!

    6) Express your opinion on the topic of current food diversification:genetically modified food, organic food, convenient products(fast food, frozen food, etc).

    10) Free writing.

    Sample 3 (collected in June)

    7) Terrorism in our society.

    8) Do you feel influenced by other cultures stereotypes? Are theyalways true? How should we deal with them?

    9) Does your country offer enough job opportunities for a futurecareer? Would you rather move abroad? If so, what are theadvantages and disadvantages?

    10) Free writing.

    The samples were carried out during lecture time at university andstudents were given approximately one hour to write a composition of250 to 300 words on one of the proposed topics (1-9) or about a topicof their choice (10). As evidenced above, each of the argumentativetopics was assigned a number between one and nine. Any instancesof free writing were assigned the number ten given that each samplecontained this option and coding it three times seemed less logicalthan simply assigning one code, i.e. number ten.

    Following collection of the written texts, they were subsequentlytranscribed in order to keep the data stored in electronic form.Furthermore, each of the texts has a code comprising five elementsto distinguish the following:

  • 8/6/2019 Example of an Introduction

    6/10

    (i) Which of the two universities the student is at:University of Jan (JA)University of Granada (GR)

    (ii)Which of the three samples the text corresponds to:October - AMarch - BJune - C

    (iii) Which of the academic year the text corresponds to:Year 1 - 2003-2004Year 2 - 2004-2005Year 3 - 2007-2008Year 4 - 2008-2009

    (iv) A three digit code to represent the participant (e.g. 001).

    (v)Which of the topics the student chose to write about (1,2,3,etc).

    To give an example of a text used for this dissertation, a first yearundergraduate studying at the University of Granada in the academicyear (2003-2004) has the following filename assigned to theircomposition: GR-A-1-026-10. This filename indicates the following:

    (i) The student studies at the University of Granada (GR).

    (ii)The student has written the first of the three samples (A).

    (iii) The student has written a text in the first year that thecorpus has been collected (1).

    (iv) The student for anonymity reasons has a three digit codeassigned (026).

    (v)The student has chosen to write a composition on a topic of

    their choice (10).

    Within the electronic texts there are also a number of editorial codesthat together comprise the EYES (Explicitly Encoded SurfaceModifications) annotation scheme, based on preset guidelines (seeSperberg-McQueen and Burnard, 2002) and have been included toindicate instances of the following nature:

    (i) word to evidence where students have crossed out aword. (JA-B-3-266-10).

  • 8/6/2019 Example of an Introduction

    7/10

    (ii)word to evidence where students have inserted a wordlate, indicated by writing in above the line. (JA-A-3-290-3).

    (iii) word to evidence instances where students haveused correction tape. (JA-B-3-357-10).

    (iv) word to evidence instances where studentshave chosen to relocate the position of a word. (JA-B-3-403-10).

    (v)word to evidence the new location of a relocatedword. (JA-B-3-403-10).

    To date, the NOCE corpus has been tagged with three differenttaggers (Brill tagger, Stanford taggerand TnT tagger). In addition, thefirst year of the corpus compiled at the University of Granada in 2003-2004 has been annotated for errors using the tagger EARS (ErrorAnnotation Retrieval System) (Daz-Negrillo, 2009). The annotationconsists of a hierarchical system that describes six different levels(clause grammar, phrase grammar, word grammar, lexis, spelling andpunctuation) and as such researchers are able to retrieve errors ofeach of these types. A brief description of each level follows:

    (i) Clause grammar: for errors in constituents and syntacticprocesses at clause level.

  • 8/6/2019 Example of an Introduction

    8/10

    (ii) Phrase grammar: for errors in constituents and syntacticprocesses at phrase level.

    (iii)Word grammar: for errors in grammatical functions of wordclasses.

    (iv)Lexis: for errors in the use and derivation of lexical units.

    (v) Spelling: for errors in word lettering, capitalization and wordboundary.

    (vi)Punctuation: for punctuation errors.

    The range of error tags within this annotation scheme ranges fromthe 6 that are listed above, in their most general description, to 612when they provide the maximum amount of detail regarding an error.

    For the purposes of this investigation, the sample chosenincluded two of the academic years from the NOCE corpus, whichwere the first year (2003-2004) of the corpus involving students fromthe University of Granada and the first year that collection for thecorpus began at the University of Jan (2007-2008). The sampleselected contains 399 texts, which amounts to 116,392 words. Theselection of these two years was to ensure that a representativesample of the corpus be used. Moreover, the first year of the corpus(Granada 2003-2004) was primarily employed based on the fact thatit is annotated for grammatical errors. Therefore, when the manual

    search for examples of unnatural language began, the process to findsuch examples was assisted by the fact that annotated words andsentences could be discarded from the search. Whilst this was not thecase with the other year of the corpus (Jan 2007-2008) selected, theuse of similar language patterns by students made it easier to identifyinstances of unnatural language and those where an error had beencommitted.

    1.4Data processing

    Given that the entire corpus had been transcribed, after the initialdecision to study two of the academic years of the corpus (Granada2003-2004 and Jan 2007-2008), it was necessary to create six files inwhich each file contained all of the texts from a given sample(October, March and June) and a given academic year. Havingmerged all of the texts from one sample into a Word document, thenext step was a manual search of all the examples of language thatfrom the view of a native speaker sounded unnatural.

    The initial search for cases of unnatural language was carriedout by a native speaker of British origin. The individual, also theauthor of this dissertation, is a female, aged 26 and born in the NorthWest of England, although she lived in the South of England from the

  • 8/6/2019 Example of an Introduction

    9/10

    age of two until beginning her university studies at the age of twenty.In addition to identification of these examples, a second source wasemployed in order to confirm or refute the intuitions of the mainnative researcher. This source is a male, aged 55 and also born in theNorth West. Unlike the previous case though, he spent his entire

    childhood and young adult years in the North West.It is important to mention this information because the issue of

    dialect, i.e. the language used in a particular area of a country, maydiffer to another area of the country in terms of the grammar and thewords used by the local people. In addition, the issue of idiolect, i.e.the way in which an individual person uses the language, regardlessof geographical factors, also needs to be considered.

    Both of these factors may impact on the outcome of this kind ofstudy, i.e. at the time of identifying examples of unnaturalness. Thisinvestigation therefore recognises that, at the time of retrievingexamples of unnatural language from the NOCE corpus, differencesmay well have occurred in the examples selected as a result of thedialectal and idiolectal background of both the main and assistantresearcher. However, to help counteract this issue, and furthersupport the intuitions of both English natives involved in thisinvestigation, the British National Corpus (BNC henceforward) as wellas Google BBC were accessed. These sources were used with theintention of confirming or refuting the frequency of certain languageitems as used by native speakers.

    Once the examples were retrieved from the two academicyears, each containing three samples, they were classified into

    appropriate categories. Initially, five categories were determined:Individual lexical selection; phrasing; imaging; word order; negation.However, it soon became evident that not all the examples could infact be classed as cases of unnatural language based on thecategories that had been identified. To explain, the first category,labelled as Individual lexical selection contained two subgroups thatwere labelled as free and bound cases. Those classified as freeinvolved language items that represent what James (1998) refers toas cases ofacceptability, i.e. when a word or sequence is used in acontext in which it should not be, thereby producing a piece oferroneous language.

    This original taxonomy was considered therefore inadequate, atleast as a starting point for the classification of unnatural language.However, by establishing the common elements among thoseexamples that were in fact errors, a distinction could be madebetween what constitutes erroneous language and what therebywould constitute unnatural language, the latter referring to instancesof words or structures that are correct grammatically speaking, butsound unnatural to the native speaker.

    Subsequent to identifying a distinction between an error and acase of unnatural language therefore, cases of unnatural languagewere grouped into one of the following five categories (frequency,register, word order and negation, miscellaneous). Using theaforementioned categories a new taxonomy was created which would

  • 8/6/2019 Example of an Introduction

    10/10

    now cater specifically for examples of unnatural language (seechapter 3).

    In spite of the successful creation of a taxonomy that would nowcater for retrieving examples of unnatural language, there were still anumber of problematic cases. These cases were problematic because

    the sub-categories identified to explain the origin of these new socallederror type (unnatural language), did not cater sufficiently for alimited number of examples (see appendices section). In someexamples it was difficult to pinpoint an individual element of thesentence as being unnatural. Rather, there was commonly a problemwith the way in which this chosen combination had been used as asequence. It was therefore difficult to allocate such examples to oneof the categories identified as unnatural language (frequency, wordorder, negation and register) because none of the classificationscould accommodate this type of deviation from the English language.In order to deal with these problematic cases, the addition of acategory labelled as miscellaneous was necessary and applied. Thesecases were also included in the count and this data can be found inchapter 3.