an exploratory analysis of scientific texts stefania ... lexico-grammar of stance: an exploratory...

33
The lexico-grammar of stance: an exploratory analysis of scientific texts Stefania Degaetano & Elke Teich Stefania Degaetano & Elke Teich Universität des Saarlandes Saarbrücken Linguistische Profile interdisziplinärer Register (2006-2009) Register im Kontakt (2011-2014)

Upload: truongtu

Post on 04-May-2018

226 views

Category:

Documents


5 download

TRANSCRIPT

The lexico-grammar of stance:an exploratory analysis of scientific texts

Stefania Degaetano & Elke TeichStefania Degaetano & Elke Teich

Universität des Saarlandes

Saarbrücken

Linguistische Profile interdisziplinärer Register (2006-2009)

Register im Kontakt (2011-2014)

Overview

• Background & Motivation

• Corpus

• Methodology• Methodology

• Analysis & Results

• Discussion & Future Work

23.02.2011 2

Background & Motivation

Growing interest in meaning-oriented analysis of texts

• Descriptive linguistics/corpus linguistics

(Halliday 1985, Biber et al. 1999, Martin & White (Halliday 1985, Biber et al. 1999, Martin & White

2005, Reis 1999, Hunston & Thompson 2003)

• Computational linguistics

(Pang & Lee 2008, Liu 2010, Taboada et al.

forthcoming)

23.02.2011 3

Background & Motivation

Meaning potential is associated to functions (metafunctions)

• ideational

– expression of experience, including processes within and beyond the self and phenomena of the external world and of consciousness

• interpersonal – personal participation; it expresses the speaker’s role in the speech

situation, his personal commitment and his interaction with others

• textual

– concerned with the creation of text; it expresses the structure of information, and the relation of each part of the discourse to the whole and to the setting

(Halliday 1973: 351)

23.02.2011 4

Background & Motivation

Understanding of interpersonal meaning is still fragmentary because

it is realized in a variety of forms

• phrasal/clausal, e.g., it is important that, it is obligatory to

• lexical, e.g., modal verbs, modal adverbs• lexical, e.g., modal verbs, modal adverbs

• many ambiguous lexemes (connotations!)

it is extremely context-dependent (register)

• phrasal/clausal, e.g., You should write an outline. (British National Corpus) vs. It is obligatory to write an outline.

• many ambiguous lexemes (connotations!)

23.02.2011 5

Background & Motivation

• Contribute to a better understanding of how

interpersonal meaning is expressed

• Registers of scientific writing

commonalities/differences across different commonalities/differences across different

scientific disciplines in expressing interpersonal

meaning

23.02.2011 6

“The notion of register is typically described as

functional variation” (Quirk et al. 1985:15),

i.e., variation according to type of situational context.

Corpus

Darmstadt ScientificText Corpus (DaSciTex)

• full English journal articles (early 2000’s)

23.02.2011 7

(early 2000’s)

• approx. 17 million words

• tokenized, pos-tagged, lemmatized

• currently being diachronically extended (1960/70’s)

(Teich & Holtz 2009, Teich & Fankhauser 2010)

Methodology

Stance

• one kind of interpersonal meaning

• refers to how writers convey personal feelings and assessments in addition to propositional content

• three kinds of meaning associated with stance• three kinds of meaning associated with stance

− epistemic (e.g. probably, it is possible to)

− attitudinal (e.g. surprisingly, it is important to)

− style (e.g. honestly, briefly)

(cf. Conrad & Biber 2003)

(this kind of interpersonal meaning is also known under other labels: ‘evaluation’(Hunston & Thompson 2003), ‘appraisal’ (Martin 2003), ‘hedging’(Hyland 1996))

23.02.2011 8

Methodology

Stance realized by lexico-grammatical patterns

“[…] if a combination of words occurs relatively

frequently, if it is dependent on a particular word frequently, if it is dependent on a particular word

choice, and if there is a clear meaning associated

with it […]”

(Hunston & Francis 2000: 37)

23.02.2011 9

Methodology

Examples

it is ADJ to-INF

It is, however, possible to call this result into question. (C1-Linguistics)

it is ADJ that

It is clear that in some cases nesting is correlated with […]. (C2-Biology)

this v-link ADJ for/to/ifthis v-link ADJ for/to/ifThis is difficult to do for the algorithm. (A-ComputerScience)

make it ADJ […] two facts make it possible to classify the genes. (C2-Biology)

dt most ADJ n […] since they have the most important optimization potential […]. (B3-CAD)

evaluative noun of Its main drawback lies in the difficulty of obtaining a large set […] (B1-CompLing)

(cf. Degaetano 2010)

23.02.2011 10

Methodology

Extraction of pattern instances

Corpus Query Processor (Evert 2005)

• searches by means of regular expressions• searches by means of regular expressions

• one very common pattern is the it is ADJ to-INF pattern, e.g. it is easy to

"it|It" [pos="VB.*"][]{0,3}[pos="J.*"] "to";

23.02.2011 11

Methodology

Examples

23.02.2011 12

Methodology

EPISTEMIC stance ATTITUDINAL stance

POSSIBILITY IMPORTANCE COMPLEXITY other

• possible, feasible

• impossible,

• important,

necessary,

relevant, vital,

• difficult, hard

• easy, simple

• interesting, intriguing

• worthwhile, worth

• natural, common,

Stance & meaning groups

23.02.2011 13

• impossible,

unfeasible

relevant, vital,

essential,

significant

• trivial,

unimportant,

unnecessary

• easy, simple • natural, common,

customary

• reasonable, plausible

• useful , instructive,

advantageous, helpful

• sufficient, enough

• desirable

(classified according to WordNet)

Analysis

Stance as expressed by the it is ADJ to-INF pattern

• differences / commonalities across different registers

of DaSciTex in terms of stanceof DaSciTex in terms of stance

• Do the ‘mixed disciplines’ show differences in

comparison to computer science and their ‘pure

disciplines’?

23.02.2011 14

Analysis 1 – ResultsEpistemic vs. attitudinal stance

epistemic attitudinal

subcorpus F % F %

A 186 32.75 382 67.25

B1 72 29.51 172 70.49

B2 144 33.64 284 66.36

B3 133 28.60 332 71.40B3 133 28.60 332 71.40

B4 164 38.86 258 61.14

C1 129 32.74 265 67.26

C2 75 35.38 137 64.62

C3 153 36.17 270 63.83

C4 205 35.59 371 64.41

Mixed disciplines Pure disciplines

A: computer B1: computational linguistics C1: linguistics

science B2: bioinformatics C2: biology

B3: computer aided design C3: mechanical engineering

B4: microelectronics C4: electrical engineering23.02.2011 15

Analysis 1 – ResultsSummary

Within the it is ADJ to-INF pattern

– more attitudinal stance expressed by the

IMPORTANCE and COMPLEXITY-group IMPORTANCE and COMPLEXITY-group

– less epistemic stance expressed by the

POSSIBILITY-group

23.02.2011 16

Analysis 2 – ResultsMeaning groups

60%

70%

80%

90%

100%

OTHERS

0%

10%

20%

30%

40%

50%OTHERS

COMPLEXITY

IMPORTANCE

POSSIBILITY

23.02.2011 17

Analysis 2 – Results IMPORTANCE-group

28,28 28,27

40

35,55

27,66 28,3

36,41

25,8725

30

35

40

45

Mixed disciplines Pure disciplines

%

%

%

%

%

12,5

0

5

10

15

20

25

23.02.2011 18

%

%

%

%

%

%

Analysis 2 – Results COMPLEXITY-group

35,39

31,15

24,07 22,80

18,72

22,5925,00

18,20

25,17

20

25

30

35

40

Mixed disciplines Pure disciplines

%

%

%

%

%18,72 18,20

0

5

10

15

20

23.02.2011 19

%

%

%

%

%

Analysis 2 – ResultsSignificant differences in DaSciTex

corpora p-value signif. direction

POSSIBILITY IMPORTANCE COMPLEXITY OTHERS

B1 – A 3.099e-07 s - + - -

B2 – A 5.979e-10 s + - -

B3 – A < 2.2e-16 s + - -

B4 – A < 2.2e-16 s + - -B4 – A < 2.2e-16 s + - -

B1 – C1 0.0385 s - + -

B2 – C2 0.8106 ns

B3 – C3 0.07039 ns

B4 – C4 5.099e-05 s + - -

Mixed disciplines Pure disciplines

A: computer B1: computational linguistics C1: linguistics

science B2: bioinformatics C2: biology

B3: computer aided design C3: mechanical engineering

B4: microelectronics C4: electrical engineering23.02.2011 20

Analysis 2 – ResultsSummary

Mixed disciplines

1. make more use of the IMPORTANCE-group than computer science (A)

2. bioinformatics (B2) and computer aided design (B3) similar to their pure disciplines

3. very pronounced distinctness of microelectronics (B4) (differs in the same way from A and C4)

4. less pronounced difference of computational linguistics (B1)

23.02.2011 21

Analysis 3 Thing evaluated

Examples

1. It is important to evaluate the winglets [...] (C3)

2. Thus, it is important to model the functionality (C4)2. Thus, it is important to model the functionality (C4)

3. It is important to note that the shape [...] (C3)

4. At this point, however, it is important to highlight the following [...] (C4)

23.02.2011 22

Analysis 3 – Resultsimportant + cognitive verb

lexical verb F %

note 152 58.91

understand 18 6.98

consider 17 6.59

observe 14 5.43observe 14 5.43

realize 10 3.88

notice 9 3.49

recognize 6 2.33

remark 5 1.94

remember 5 1.94

�important + note different functional status

�formulaic expression with stylistic meaning?!

23.02.2011 23

Analysis 3 – Results ADJ + note within the it is ADJ to-INF pattern

ADJ Frequency %

important 152 51.70

interesting 109 37.07

worthwhile 10 3.40

worth 6 2.04

23.02.2011 24

worth 6 2.04

worthy 3 1.02

instructive 2 0.68

easy 2 0.68

significant 2 0.68

pertinent, surprising, critical, essential, useful, possible,

crucial, sufficient (each occurring once)

8 2.72

Analysis 3 – Results Occurrences of note in DaSciTex

Type of occurrence Frequency %

it is ADJ to note 294 64.33

note not within the pattern 163 35.67

note (base form) total in DaSciTex 457

23.02.2011 25

Analysis 3 – Resultsit is ADJ to note within DaSciTex

11,9

20,75

13,9515

20

25

Mixed disciplines Pure disciplines

%

%

%

23.02.2011 26

10,2

7,82

11,9

7,82 7,82

10,2 9,52

0

5

10%

%

%

Analysis 3 – ResultsSummary

it is ADJ to note

• most often used with important and interesting

• basic form of note in DaSciTex more often used within • basic form of note in DaSciTex more often used within

the it is ADJ to-INF pattern

� seems to be used as a formulaic expression

• relatively frequently used by microelectronics (B4)

23.02.2011 27

Discussion & Future Work

• investigate additional verbs occurring within the it is ADJ to-INF

pattern (process types: mental, material, verbal, relational)

• investigate additional patters to find more evidence of the

tendencies of cross-disciplinary variation

• explore the constraints between evaluative category and thing • explore the constraints between evaluative category and thing

evaluated for

– potentially discriminatory effects between scientific disciplines

– automatic attribution of the value of the evaluative category

to the thing evaluated

• explore automated approaches for annotation of interpersonal

expressions and probabilistic methods for corpus comparison

23.02.2011 28

Discussion & Future Work

• knowledge of how evaluative patterns are constructed brings a

better understanding of interpersonal meaning, e.g. stance

• the pattern approach allows a fairly easy identification of

particular stance expressions in large corpora

• this knowledge may be used to improve existing systems in • this knowledge may be used to improve existing systems in

sentiment analysis

– i.e., the classification approach and its extraction pattern

learning algorithms (Wiebe and Riloff (2003)) and

– the evaluative category and the thing evaluated could be

automatically identified

• interpersonal meaning is context-dependent (register)

23.02.2011 29

Thank you for your attention!

Any questions?Any questions?

23.02.2011 30

ReferencesDouglas Biber, Stig Johansson, and Geoffrey Leech. Longman Grammar of Spoken and

Written English. Longman, Harlow, 1999.

Susan Conrad and Douglas Biber. Adverbial marking of stance in speech and writing. In

Susan Hunston and Geoff Thompson, editors, Evaluation in Text, Authorial Stance

and the Construction of Discourse, pages 56–73. Oxford University Press Inc., New

York, 2003.

Stefania Degaetano. Evaluation in Academic Research Articles across Scientific

Disciplines. Master’s thesis, Technische Universität Darmstadt, 2010. Disciplines. Master’s thesis, Technische Universität Darmstadt, 2010.

Stefan Evert. The CQP Query Language Tutorial. IMS Stuttgart, 2005. CWB version

2.2.b90.

M.A.K. Halliday. Explorations in the functions of language. Arnold, London, 1973.

M.A.K. Halliday. An Introduction to Functional Grammar. Arnold, London, 1985.

Ken Hyland. Academic Discourse, editor. Continuum, London, 2009.

Susan Hunston and Gill Francis. Pattern Grammar: A Corpus-driven Approach to the

Lexical Grammar of English. Studies in Corpus Linguistics. John Benjamins

Publishing, Amsterdam/ Philadelphia, 2000.

23.02.2011 31

References

Susan Hunston and Geoff Thompson, editors. Evaluation in Text: Authorial stance and

the construction of discourse. Oxford University Press, Oxford, 2003.

Bing Liu. Sentiment Analysis and Subjectivity. In Nitin Indurkhya and Fred J. Damerau,

editors, Handbook of Natural Language Processing. CRC Press, Goshen,

Connecticut, USA, 2 edition, 2010.

Jim R. Martin. Beyond Exchange: APPRAISAL Systems in English. In Susan Hunston and

Geoff Thompson, editors, Evaluation in Text, Authorial Stance and the Construction Geoff Thompson, editors, Evaluation in Text, Authorial Stance and the Construction

of Discourse, pages 56–73. Oxford University Press Inc., New York, 2003.

Jim R. Martin and Peter R.R. White. The Language of Evaluation, Appraisal in English.

Palgrave Macmillan, London & New York, 2005.

Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundations and

Trends in Information Retrieval, 2:Nos. 1–2:1–135, 2008.

Randolph Quirk, Sidney Greenbaum, Jan Svartvik, and Geoffry Leech. A Comprehensive

Grammar of the English Language. Longman, 1985.

23.02.2011 32

References

Marga Reis. On sentence types in German: An enquiry into the relationship between grammar and pragmatics. Interdisciplinary Journal for Germanic Linguistics and Semiotics Analysis, 4:195 – 236, 1999.

Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. Lexicon-based methods for sentiment analysis. Computational Linguistics, forthcoming.

Elke Teich and Mônica Holtz. Scientific registers in contact. An exploration of the Elke Teich and Mônica Holtz. Scientific registers in contact. An exploration of the lexico-grammatical properties of interdisciplinary discourses. International Journal of Corpus Linguistics, 14(4):524–548, 2009.

Elke Teich and Peter Fankhauser. Exploring a corpus of scientific texts using data mining. In Gries S., Wulff S. and M. Davies, editors, Corpus-linguistic applications: Current studies, new directions, Rodopi, Amsterdam and New York: 233-247, 2010.

Janyce Wiebe and Ellen Riloff. Learning extraction patterns for subjective expressions. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Sapporo, Japan, July 2003.

23.02.2011 33