stylistics – case study see last slide for websites used to get numerical information from texts

20
Stylistics – case study see last slide for websites used to get numerical information from texts

Upload: lisandro-simonds

Post on 14-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Stylistics – case study see last slide for websites used to get numerical information from texts

Stylistics – case study

see last slide for websites used to get numerical information from texts

Page 2: Stylistics – case study see last slide for websites used to get numerical information from texts

2/20

Stylistic analysis

• Literary vs linguistic stylistics– Lit crit focuses on effect on the reader,

intended or otherwise, so largely intuitive and subjective

– Linguistic stylistics looking for characterisations of style (including literary style) in terms of linguistic phenomena at the various levels of linguistic description

Page 3: Stylistics – case study see last slide for websites used to get numerical information from texts

3/20

Stylistic analysis

• Inventory of linguistic devices and their effect– usually in a contrastive way:– in contrast with other texts of a similar genre– in contrast with other genres

• Linguistic devices described in terms of the usual linguistic levels of description: phonology, morphology, lexis, grammar, etc.

Page 4: Stylistics – case study see last slide for websites used to get numerical information from texts

4/20

Example

• Newspaper reporting of a similar story• Sun vs Independent

– readership by social class– Sun: widely read (c. 5m), mostly by lower

class and lower middle class– Independent: circulation 0.25m, educated

middle class

• How would you expect this different readership to be reflected in the styles?

Page 5: Stylistics – case study see last slide for websites used to get numerical information from texts

5/20

Sun vs Independent

• Targeted readership largely dictates subject matter and the angle of coverage

• From a purely linguistic point of view we might expect differences in …– vocabulary– complexity of sentence structure

• Other differences might include literary• But (compared to other texts) features of the

genre (newspaper story) may be shared

Page 6: Stylistics – case study see last slide for websites used to get numerical information from texts

6/20http://www.independent.co.uk/news/world/asia/hawker-family-make-new-plea-799964.html

Page 7: Stylistics – case study see last slide for websites used to get numerical information from texts

7/20http://www.thesun.co.uk/sol/homepage/news/justice/article952630.ece

Page 8: Stylistics – case study see last slide for websites used to get numerical information from texts

8/20

Sun Independent

Family bid for Lindsay's killer Hawker family make new plea

THE family of an English teacher murdered in Japan today appealed for her killer to be found – a year after her death.

The family of a young British teacher murdered in Japan were yesterday flying to Tokyo to launch a fresh appeal on the first anniversary of her death.

Lindsay Ann Hawker, 22, was found dead in a sand-filled bath on a balcony of a flat belonging to one of her students.

Lindsay Ann Hawker, 22, was found dead in a bath filled with sand on the balcony of a flat in Ichikawa, east of Tokyo, on March 27 last year.

Parents Bill and Julia and their daughters Lisa, 26, and Louise, 23 have flown to capital Tokyo to “get justice for Lindsay”.

Miss Hawker's parents, Bill and Julia, and her two sisters, Lisa and Louise, will leave London's Heathrow Airport this morning to travel to the Japanese capital to renew their appeal to find her killer.

A poster campaign aims to help catch suspect Tatsuya Ichihashi, 29, who fled from cops.

Detectives are still hunting 29-year-old suspect Tatsuya Ichihashi, who lived at the flat and fled when approached by officers for questioning.

More than 20,000 people joined a tribute page on website Facebook

A webpage set up on social networking site Facebook, called "Don't forget Lindsay Hawker, Please remember this Face", now has more than 20,000 members.

Some differences

Page 9: Stylistics – case study see last slide for websites used to get numerical information from texts

9/20

Some differences

• Differences of detail– [Some are due to slightly different publication time, before or

after press conf]– What elements are of interest?

• Differences of vocabulary– cops vs officers, dad vs father, year after vs anniversary

• Differences of explication– capital of Japan, Facebook

• Differences of syntax– surprisingly few– but possible stylistic trademark of redtop is internal structure of

noun phrases …

Page 10: Stylistics – case study see last slide for websites used to get numerical information from texts

10/20

Appositive noun phrases

• a sand-filled bath • Parents Bill and Julia

• capital Tokyo

• suspect Tatsuya Ichihashi, 29

• website Facebook

• a bath filled with sand• Miss Hawker's parents,

Bill and Julia • Tokyo; the Japanese

capital • 29-year-old suspect

Tatsuya Ichihashi • [the] social networking

site Facebook

Page 11: Stylistics – case study see last slide for websites used to get numerical information from texts

11/20

Numerical comparison

• Thanks to computers it is now (relatively) easy to count things

• What should we count?– easy to count number of paragraphs, sentences,

words, letters– may give a measure of complexity

• average sentence length (words/sentence)• average word length• percentage of long words

– type:token ratio (vocabulary richness)• number of types = number of different words• number of tokens = total number of words• Hapax legomena = numbner of unique words

Page 12: Stylistics – case study see last slide for websites used to get numerical information from texts

12/20

Normalization and significance

• Always important to compare like with like– It is usual when counting things to “normalize” over

the length of the text– If one text is longer than the other, of course you

would expect higher frequencies of everything

• Issue of statistical significance– Small differences may not really tell you anything– Various measures can confirm whether difference is

statistically significant or due to random fluctuation

Page 13: Stylistics – case study see last slide for websites used to get numerical information from texts

13/20

How to count

• How to recognize paragraph breaks?• How to recognize sentence breaks?

– Headlines don’t end in a fullstop– Not all sentences end in a fullstop– Not all full stops are sentence ending (abbreviations)

• How to count words– Hyphenated words, contractions e.g. don’t

• How to measure word-length/complexity– length only roughly corresponds to complexity– number of characters vs number of syllables– cf. through vs idea– counting syllables implies either a dictionary or an algorithm

Page 14: Stylistics – case study see last slide for websites used to get numerical information from texts

14/20

Numerical comparisonSun Indy

sentences 13 10

words 262 257

letters/numbers 1166 1213

complex words 19 (7%) 36 (14%)

syllables 356 378

av’ge word length (characters) 4.45 4.72

av’ge word length (syllables) 1.36 1.47

av’ge sentence length (words) 20.15 25.7

short sentences 6 (42%) 4 (40%)

long sentences 2 (14%) 1 (10%)

types 156 165

type-token ratio 0.60 0.62

Hapax legomena 110 128

• texts are roughly the same length

• Hard to know if any differences are statistically significant with such a small amount of data, but …

• Indy does have more complex words …

• and higher AWL and ASL …

• and higher ratio of short:long sentences …

• and richer vocabulary

Page 15: Stylistics – case study see last slide for websites used to get numerical information from texts

15/20

Word length

• Comparison of distribution of words by length only tells us that the two texts are very similar

• correlation ρ = 0.977

0102030

405060

1 3 5 7 9 11

word lengthto

tal

Indy

Sun

Page 16: Stylistics – case study see last slide for websites used to get numerical information from texts

16/20

Syntactic informationSun Indy

questions 0 0

passives 8 (57%) 6 (60%)

longest sentence 33 words 43 words

shortest sentence 4 words 7 words

use of verb to be 8 8

use of auxiliary 1 3

conjunctions 3 (8%) 4 (10%)

pronouns 7 (19%) 4 (11%)

prepositions 13 (34%) 14 (38%)

nominalizations 0 1 (2%)

Sentence beginnings: pronouns 6 1

article 1 4

conjunction 0 0

preposition 0 0

• Again, hard to know if differences are significant

• This kind of measure more useful to distinguish different genres

Page 17: Stylistics – case study see last slide for websites used to get numerical information from texts

17/20

Readability

• Big interest from teachers, publishers and researchers in quantifying the appropriate reading age for a text– i.e. what level of education do you need to understand

this text? (reader-oriented view)– or: for what age of readership is this text appropriate

(text-oriented view)• Most measures based on combination of

average word length (measured in characters or syllables), and average sentence length– some additionally take into account proportion of

long/short words

Page 18: Stylistics – case study see last slide for websites used to get numerical information from texts

18/20

Readbility indexes

1291.3sentences

syllables 3 words300430.1SMOG

words

chars 6 words100

sentences

wordsLix

words

syllables 3 words100

sentences

words4.0FOG

sentences

words015.1

words

syllables6835.206Flesch

8.15words100

sentences3.0

words

characters89.5CLI

43.21sentences

words5.0

words

characters71.4ARI

59.15sentences

words39.0

words

syllables8.11Kincaid

Page 19: Stylistics – case study see last slide for websites used to get numerical information from texts

19/20

Readability indexes• Most give a (US) school grade:

– Kincaid – best for technical material; short sentences, eg in dialogues, will lower the score: gives a grade level

– ARI (Automated Readability Index) – Coleman-Liau – counts characters rather than syllables, so easier to

implement– SMOG (simple measure of goobledygook) (McLaughlin 1969) – can be

estimated by sampling e.g. 3 10-sentence segments; said to give best correlation with its criterion. See http://www.harrymclaughlin.com/SMOG.htm

– FOG (Gunning 1952) – gives a school grade. Score >12 means “too hard to read”!

• A few give a raw score:– Flesch-Kincaid – widely used, simple calculation; the higher the score,

the easier it is to read. Highest possible score is 121 (text made up of one-word one-syllable sentences). Score around 100 means OK for 11-yr old. Time magazine ~52, Harvard Law Review low 30s.

– Lix (Björnsson) – originally developed for Swedish, raw score <24 suitable for children, >55 very hard.

Page 20: Stylistics – case study see last slide for websites used to get numerical information from texts

20/20

ReadabilitySun Indy

Kincaid 9 11.1

ARI 10.4 13.3

Coleman-Liau 9 11.1

Flesch-Kincaid 69.7 62.8

Gunning FOG 11.6 13.8

Lix 38.9 46.1

SMOG 10.4 13.1

http://www.editcentral.com/gwt/com.editcentral.EC/EC.htmlalso suggests where improvements can be made!also used (give slightly different figures, probably depending on how they count things)http://www.readability.info/http://www.online-utility.org/english/readability_test_and_improve.jsp

Conversion:Add 1 to US grade to give British school yeareg 11th grade = year 12

Note: with Flesch-Kincaid,lower score means harderto read