understanding corpus linguistics researchpragmatics, language variation vocabulary analysis, second...
TRANSCRIPT
Leveraging the Potential of Corpora: Corpora and corpus tools in action
Laurence Anthony Center for English Language Education in Science and Engineering (CELESE) Faculty of Science and Engineering, Waseda University, Japan [email protected] www.laurenceanthony.net/ @antlabjp
Harnessing the latest corpus-based approaches for research, The Open University of Hong Kong, Hong Kong, China, June 29-30, 2017
Overview � Understanding corpus linguistics
� useful definitions � a brief history of corpora and corpus tools � insights from corpus data
� Doing corpus linguistics � applications in morphology, syntax, semantics,
pragmatics, language variation � vocabulary analysis, second language learning,
literature studies, conversational analysis, and discourse analysis
� Traveling to the future in corpus linguistics � eye-tracking research � data-mining and network analysis
2
Understanding corpus linguistics research: useful definitions
4
Understanding corpus linguistics research: What is corpus linguistics?
� It is an empirical (experimental) approach � An analysis of actual patterns of use in target texts
� It uses a corpus of natural texts as the basis for analysis � Corpus = a representative sample of target language stored as an
electronic database (plural = "corpora")
� It relies on computer software for analysis � Results are generated using automatic and interactive techniques
� It depends on both quantitative and qualitative analytical techniques � Observations are counted and results are interpreted
Biber, Conrad, and Reppen (1998)
5
Understanding corpus linguistics research: What is corpus linguistics?
"A corpus is a collection of machine readable, authentic texts, which is sampled to be representative of a particular language or language variety." (McEnery et al., 2006: 5)
6
Understanding corpus linguistics research: What are the main limitations?
� If a word or phrase does not appear in a corpus, we cannot obtain any information about it � Corpus studies are based on what we observe
� The larger the corpus, the more reliable it will be about revealing information on language features � Bigger corpora are (usually) better
� But... however large a corpus is, it can never represent all the variation in a language (except in special cases) � A corpus provides only an approximation to reality � It can suggest trends but not “facts” about language use � We need to use statistics to determine significant, important patterns
Understanding corpus linguistics research: a brief history of corpora and corpus tools
8
Understanding corpus linguistics research: A brief history of corpora and corpus tools
� 1960s � Noam Chomsky puts grammar, computers, (and intuition) into the
spotlight � “Syntactic Structures” (1957) � “Aspects of the Theory of Syntax” (1965)
� Intuition and experience dictate much of dictionary and textbook content
� Growth in computer technologies leads to the development of… � authentic language databases (corpora)
� Brown Corpus (Henry Kucera and W. Nelson Francis, 1961)
9
Understanding corpus linguistics research: A brief history of corpora and corpus tools
Brown Corpus Sample A01 0010 The Fulton County Grand Jury said Friday an investigation A01 0020 of Atlanta's recent primary election produced "no evidence" that A01 0030 any irregularities took place. The jury further said in term-end A01 0040 presentments that the City Executive Committee, which had over-all A01 0050 charge of the election, "deserves the praise and thanks of the A01 0060 City of Atlanta" for the manner in which the election was conducted. A01 0070 The September-October term jury had been charged by Fulton A01 0080 Superior Court Judge Durwood Pye to investigate reports of possible A01 0090 "irregularities" in the hard-fought primary which was won by A01 0100 Mayor-nominate Ivan Allen Jr&. "Only a relative handful
Henry Kucera W. Nelson Francis
http://icame.uib.no/icame16/kucera.francis.jpg 10
Understanding corpus linguistics research: A brief history of corpora and corpus tools
� 1960s � Noam Chomsky puts grammar, computers, (and intuition) into the
spotlight � “Syntactic Structures” (1957) � “Aspects of the Theory of Syntax” (1965)
� Intuition and experience dictate much of dictionary and textbook content
� Growth in computer technologies leads to the development of… � authentic language databases (corpora)
� Brown Corpus (Henry Kucera and W. Nelson Francis, 1961) � (1st generation) software to analyze corpora
� A Concordance Generator (Smith, 1966) � Discon (Clark, 1966) � Drexel Concordance Program (Price, 1966) � Concordance (Dearing, 1966) � CLOC (Reed, 1978)
Computers and the Humanities (1966, Vol. 1, Issue 2, p. 39)
11
Understanding corpus linguistics research: A brief history of corpora and corpus tools
� Discon (Clark, 1966)
12
Understanding corpus linguistics research: A brief history of corpora and corpus tools
Discon (Clark, 1966)
Understanding corpus linguistics research: A brief history of corpora and corpus tools
� 1970s-1980s � creation of more corpora
� The Lancaster/Oslo-Bergen Corpus (LOB) (1978) � Collins Birmingham University International Language Database
(COBUILD) (1980) � rapid growth in grammar/vocabulary studies based on corpora
� Collins COBUILD English Language Dictionary (1987) � development of easier to use (2nd generation) software tools
� e.g. MicroConcord (Scott & Johns, 1993)
13 14
Understanding corpus linguistics research: A brief history of corpora and corpus tools
MicroConcord (Scott & Johns, 1993)
Understanding corpus linguistics research: A brief history of corpora and corpus tools
� 1990s � further growth in corpus-building
� Bank of English (1991) � British National Corpus (BNC) (1995)
� development of tagging and annotation tools � CLAWS Tagger Ver. 4 (1994)
� development of (3rd generation) corpus-tools � WordSmith (1996-2013)
15 16
Understanding corpus linguistics research: A brief history of corpora and corpus tools
Brown Corpus Sample A01 0010 The Fulton County Grand Jury said Friday an investigation A01 0020 of Atlanta's recent primary election produced "no evidence" that A01 0030 any irregularities took place. The jury further said in term-end A01 0040 presentments that the City Executive Committee, which had over-all A01 0050 charge of the election, "deserves the praise and thanks of the A01 0060 City of Atlanta" for the manner in which the election was conducted. A01 0070 The September-October term jury had been charged by Fulton A01 0080 Superior Court Judge Durwood Pye to investigate reports of possible A01 0090 "irregularities" in the hard-fought primary which was won by A01 0100 Mayor-nominate Ivan Allen Jr&. "Only a relative handful
Henry Kucera W. Nelson Francis
http://icame.uib.no/icame16/kucera.francis.jpg
17
Understanding corpus linguistics research: A brief history of corpora and corpus tools
Brown Corpus Sample A01_FO 0010_MC The_AT Fulton_NP1 County_NN1 Grand_JJ Jury_NN1 said_VVD Friday_NPD1 an_AT1 investigation_NN1 A01_FO 0020_MC of_IO Atlanta_93 's_03 recent_JJ primary_JJ election_NN1 produced_VVD "_" no_AT evidence_NN1 "_" that_CST A01_FO 0030_MC any_DD irregularities_NN2 took_VVD place_NN1 ._. The_AT jury_NN1 further_RRR said_VVD in_II term-end_NN1 A01_FO 0040_MC presentments_NN2 that_CST the_AT City_NN1 Executive_NN1 Committee_NN1 ,_, which_DDQ had_VHD over-all_RR A01_FO 0050_MC charge_NN1 of_IO the_AT election_NN1 ,_, "_" deserves_VVZ the_AT praise_NN1 and_CC thanks_NN2 of_IO the_AT A01_FO 0060_MC City_NN1 of_IO Atlanta_NP1 "_" for_IF the_AT manner_NN1 in_II which_DDQ the_AT election_NN1 was_VBDZ conducted_VVN ._. DZ conducted_VVN ._.
Henry Kucera W. Nelson Francis
http://icame.uib.no/icame16/kucera.francis.jpg 18
Understanding corpus linguistics research: A brief history of corpora and corpus tools
British National Corpus (BNC) <bncDoc xml:id="A00"> <teiHeader> <fileDesc> <titleStmt> <title> [ACET factsheets & newsletters]. Sample containing about 6688 words of miscellanea (domain: social science) </title> <respStmt> <resp> Data capture and transcription </resp> <name> Oxford University Press </name> </respStmt> </titleStmt> <editionStmt> <edition>BNC XML Edition, December2006</edition> </editionStmt> <extent> 6688 tokens; 6708 w-units; 423 s-units </extent>
19
Understanding corpus linguistics research: A brief history of corpora and corpus tools
WordSmith Tools (Scott, M. , 2012)
Understanding corpus linguistics research: A brief history of corpora and corpus tools
� 1990s � creation of corpus-based dictionaries, textbooks, and grammar books � “1995: The Year of the Dictionaries” (Allen, 1996)
� Collins COBUILD Dictionary - 2nd Edition � Longman Dictionary of Contemporary English - 3rd Edition � Oxford Advanced Learner's Dictionary – 5th Edition � Cambridge International Dictionary of English -1st Edition � (Macmillan English Dictionary for Advanced Learners - 2002)
20
Understanding corpus linguistics research: A brief history of corpora and corpus tools
� 2000s to today � Continued development of easy-to-use 3rd-generation software tools
� e.g. WordSmith Tools, AntConc � Creation of server-based (4th-generation) software tools
� e.g. COCA, Sketch Engine, CQPWeb � Introduction of Data-Driven Learning (DDL) in the classroom
� e.g. Tim Johns, Alex Boulton � Creation of learner corpora
� e.g. Ute Romer, Yukio Tono, Ruiying Yang � Research on corpus-based vocabulary learning
� e.g. Paul Nation
21
Understanding corpus linguistics research: A brief history of corpora and corpus tools
22
Corpus of Contemporary American English (COCA) http://corpus.byu.edu/coca/
Understanding corpus linguistics research: A brief history of corpora and corpus tools
23
SketchEngine https://www.sketchengine.co.uk/
Understanding corpus linguistics research: A brief history of corpora and corpus tools
24
WordSmith Tools Scott, M. (2015)
AntConc Anthony, L. (2017)
Understanding corpus linguistics research: From 1 m word corpora to 1 t word corpora
0
500,000,000
1,000,000,000
1,500,000,000
2,000,000,000
2,500,000,000
Num
ber o
f Wor
ds
25
Understanding corpus linguistics research: From principled corpora to opportunistic corpora
26
0
500,000,000
1,000,000,000
1,500,000,000
2,000,000,000
2,500,000,000
Num
ber o
f Wor
ds
Understanding corpus linguistics research: From corpora in files to corpora in 'black boxes'
27
0
500,000,000
1,000,000,000
1,500,000,000
2,000,000,000
2,500,000,000
Num
ber o
f Wor
ds Understanding corpus linguistics research:
insights from corpus data
Insights from corpus data: Writing conference abstracts
29
Insights from corpus data: Writing conference abstracts
30
min (words)
max (words)
ave (words) std tokens types
titles 4 14 9 2.44 923 432
summaries 12 54 47 5.72 4698 1346
abstracts 78 261 207 45.25 20477 3146
Total presentations: 99
Active reading with CALL
Enhancing the CALL experience through Data-Driven-Learning: From words to texts and beyond
Shortest?
Longest?
Insights from corpus data: Writing conference abstracts
31
Total presentations: 99
Insights from corpus data: Writing conference abstracts
32
Abstracts (frequency curve)
Insights from corpus data: Writing conference abstracts
33
Abstracts (frequency bands)
Insights from corpus data: Writing conference abstracts
34
Abstracts (frequency 'heat map')
Insights from corpus data: Writing conference abstracts
35
Abstracts (frequency 'heat map')
Insights from corpus data: Writing conference abstracts
36
Most frequent words rank titles summaries abstracts
1 and and the
2 learning the and
3 the of of
4 of to to
5 for in in
6 to a a
7 in will students
8 with students for
9 a this will
10 english for learning
Insights from corpus data: Writing conference abstracts
37
Highest ranked keywords rank titles summaries abstracts
1 learning students students
2 efl presentation learning
3 digital learning language
4 online will english
5 active efl presentation
6 english english learners
7 moodle learners online
8 learners online will
9 using language presenter
10 language presenter teachers Stat: LL; p< 0.05 + Bonferroni correction; ref. corpus = AmE06 (Potts and Baker, 2012)
Insights from corpus data: Writing conference abstracts
38
Most frequent n-grams (n=2) rank titles summaries abstracts
1 active learning will be of the
2 in a this presentation in the
3 learning in in the will be
4 study abroad presentation will this presentation
5 in the in a the students
6 blended learning of the students to
7 extensive reading will demonstrate on the
8 language learning the presenter can be
9 effectiveness of of a the presenter
10 an online can be language learning
Insights from corpus data: Writing conference abstracts
39
Highest ranked key n-grams (n=2) rank titles summaries abstracts
1 active learning this presentation this presentation
2 extensive reading presentation will the presenter
3 language learning the presenter language learning
4 blended learning will demonstrate the students
5 study abroad will be students to
6 learning in active learning presentation will
7 effectiveness of presenter will the classroom
8 google docs japanese university english language
9 efl learners will introduce active learning
10 learning together language learning in japan Stat: LL; p< 0.05 + Bonferroni correction; ref. corpus = AmE06 (Potts and Baker, 2012)
Insights from corpus data: Writing conference abstracts
40
Graph of abstracts -> top 20 key-n-grams (n=2)
Insights from corpus data: Writing conference abstracts
41
Graph of abstracts -> abstracts sharing top 20 key-n-grams (n=2)
Doing corpus linguistics: applications in morphology (vocabulary analysis)
Applications in morphology: What words do you need to know to read any text in English?
43
Applications in morphology: What words do you need to know to read any text in English?
44
0
2000
4000
6000
8000
10000
12000
14000
1 13 25 37 49 61 73 85 97 109
121
133
145
157
169
181
193
205
217
229
241
Freq
uenc
y
Word Ranks in Harry Potter (Book 5)
Applications in morphology: What words do you need to know to read any text in English?
45
0
20
40
60
80
100
120
140
1 13 25 37 49 61 73 85 97 109
121
133
145
157
169
181
193
205
217
229
241
Freq
uenc
y
Word Ranks in Research Paper
Applications in morphology: What words do you need to know to read any text in English?
46
0
10000
20000
30000
40000
50000
60000
70000
80000
1 13 25 37 49 61 73 85 97 109
121
133
145
157
169
181
193
205
217
229
241
Freq
uenc
y
Word Ranks in the Brown Corpus
Applications in morphology: What words do you need to know to read any text in English?
48
Word Ranks in Research Paper
Applications in morphology: What words do you need to know to read any text in English?
49
Word Ranks in Research Paper
Applications in morphology: What words do you need to know to read any text in English?
50
"It is important that learners have access to lists of high-frequency and academic words
and are able to obtain frequency information from dictionaries." (p. 219)
P. Nation (2001)
"Priority should be given to high-frequency words and to words that clearly fulfill
language use needs. (p. 303)
Doing corpus linguistics: applications in syntax (second-language learning)
Applications in syntax: Do scientists always write in the passive voice?
52
"It was shown that …"
"We show that …"
Applications in syntax: Do scientists always write in the passive voice?
53
Active Form Freq Location Passive Form Freq Location
we found 29 Dis was/were found 26 Res/Dis
we show 20 Dis is/are shown 26 Met/Res
we have 18 Dis is/are had 0 -
we thank 10 Ack is/are thanked 0 -
we observed 15 Dis was/were observed 45 Res/Dis
we used 14 Intro/Meth /Res/Dis
was/were used 78 Met
we investigated 11 Abs/Int/ Res/Dis
was/were investigated 6 Abs/Int/ Res/Dis
Dermatology Corpus (52 texts, 89246 tokens)
Doing corpus linguistics : applications in semantics (literature studies)
Applications in syntax: What does 'blood' mean in Shakespeare's plays?
� What are the themes in Macbeth? � murder, death, witches, blood
� How is 'blood' used by Shakespeare? � How often is it used? � Which characters speak about it? � What imagery does it form?
55
Applications in syntax: What does 'blood' mean in Shakespeare's plays?
56
Example quotes Play
Her blood is settled, and her joints are stiff. Romeo & Juliet
My blood for your rude brawls doth lie a-bleeding. Romeo & Juliet
To kiss these lips, I will appear in blood. Antony & Cleopatra
When the blood burns, how prodigal the soul. Hamlet
Make thick my blood. Macbeth
And on thy blade and dudgeon gouts of blood. Macbeth
There's blood upon thy face. Macbeth
I do confess the vices of my blood. Othello
Shakespeare Tragedies (5 plays, 109,999 tokens)
Doing corpus linguistics: applications in pragmatics (conversational analysis)
Applications in pragmatics: How do you say what you "want" to an operator?
� How do people make requests during a telephone call? � Who asks most of the questions? � How do you interrupt someone? � How do you finish a conversation? � …
58
Applications in pragmatics: How do you say what you "want" to an operator?
59
Examples of politeness (customer) Examples of politeness (operator)
ok I want to leave on the … ... did you want me to verify it …
… basically what I want to know is … … did you want to leave Chicago …
I don't know if I want to … … how soon do you want to …
I have a couple of things I want to ask … … do you want to go …
… it's not giving me the fare I want. … if you want to go …
Travel Operator Corpus (50 interactions, 43050 tokens)
Doing corpus linguistics: applications in language variation
(discourse analysis)
Applications in language variation: Do all scientists talk like the guys in The Big Bang Theory?
61
Do all scientists talk like the guys in The Big Bang Theory?
62
Highest Ranked Keywords in Discipline-Specific Spoken Corpora Stat: LL; p< 0.05 + Bonferroni correction; Ref. corpus = AmE06 (Potts and Baker, 2012)
Rank Big Bang Theory Physics Biology Chemistry Math
1 2 3 4 5 6 7 8 9
10
Rank Big Bang Theory Physics Biology Chemistry Math
1 you 2 i 3 oh 4 m 5 sheldon 6 re 7 okay 8 t 9 yeah
10 leonard
Rank Big Bang Theory Physics Biology Chemistry Math
1 you field 2 i is 3 oh zero 4 m magnetic 5 sheldon current 6 re charge 7 okay electric 8 t this 9 yeah direction
10 leonard here
Rank Big Bang Theory Physics Biology Chemistry Math
1 you field fs 2 i is cells 3 oh zero cell 4 m magnetic dna 5 sheldon current gene 6 re charge ft 7 okay electric fre 8 t this here 9 yeah direction protein
10 leonard here genome
Rank Big Bang Theory Physics Biology Chemistry Math
1 you field fs is 2 i is cells energy 3 oh zero cell electron 4 m magnetic dna molecule 5 sheldon current gene electrons 6 re charge ft here 7 okay electric fre orbital 8 t this here this 9 yeah direction protein molecules
10 leonard here genome orbitals
Rank Big Bang Theory Physics Biology Chemistry Math
1 you field fs is is 2 i is cells energy zero 3 oh zero cell electron equation 4 m magnetic dna molecule x 5 sheldon current gene electrons solution 6 re charge ft here y 7 okay electric fre orbital function 8 t this here this this 9 yeah direction protein molecules omega
10 leonard here genome orbitals matrix
Traveling to the future in corpus linguistics: the rise of big data with new tools and applications
Traveling to the future in corpus linguistics: A growing interest in "big data"
64
http://www.science20.com/ (Dec 17, 2014) The Guardian Newspaper (Jan 3, 2016) The Telegraph Newspaper (Oct 13, 2015)
Chartered Financial Analyst (CFA) Digest, 43:4, 2013
Traveling to the future in corpus linguistics: A growing interest in "digital" humanities
65
Growth of publications on or citing the topic Digital Humanities or Humanities computing (Scharnhorst, 2014)
2014 over 140
1968 less than 5
Traveling to the future in corpus linguistics: A growing interest in (social) data analytics
66 Tableau
https://www.tableau.com
Traveling to the future in corpus linguistics: A growing interest in (social) data analytics
67 DataWrapper
https://www.datawrapper.de
Traveling to the future in corpus linguistics: A growing interest in (social) data analytics
68 DataSift
http://datasift.com/products/stream-for-social-data/
Traveling to the future in corpus linguistics: A growing interest in (social) data analytics
69 GNIP
https://gnip.com/
Traveling to the future in corpus linguistics: A growing interest in language data creation/translation
70 2017/4/7
http://www.nikkei.com/article/DGXLASDZ07IKR_X00C17A4TJ1000/
Microsoft Corp., Automatic translation for Japanese Microsoft announced on July 7 that the service for automatically translating speech in different languages in real time was made available in Japanese. Using free smartphone (smartphone) applications, you can interact while listening to the translated voice, and watch the translated subtitles in handy smartphone. It corresponds to English, Chinese, Spanish etc. We applied artificial intelligence (AI) related technology and increased accuracy.
71
Traveling to the future in corpus linguistics: The Scientific Research Process
Choose a topic to investigate
Review the literature
Identify a gap Collect/Analyze data
Design an experiment
Discuss the results
Traveling to the future in corpus linguistics: The Scientific Research Process
The "Corpus Approach"
Review the literature
and identify a gap
Find a ready-built corpus in the target area
Conduct an empirical
investigation
Choose a target area
Design your own custom
corpus
Decide a sampling
procedure
Collect, clean, tag, and/or
annotate the data
72 73
Traveling to the future in corpus linguistics: The Scientific Research Process
� Step 1: Choose a topic to investigate � word/phrase usage, coverage, difficulty, … � article/verb/noun/adjective usage, … � tense, modality, hedging, … � text structure, genre, (critical) discourse analysis � …
� Step 2: Review the literature � Why the topic is important? � What does everybody already know?(general knowledge) � What have other researchers discovered (previous literature)
� Step 3: Identify a gap (problem) � What gaps/problems remain in the field? � What gaps/problems remain in others’ research?
74
Traveling to the future in corpus linguistics: The Scientific Research Process
� Step 4: Design an experiment � Materials (corpora)
� BNC, MICASE, MICUSP, BROWN, LOB, … � scanned/OCR textbooks, PDF research papers, …
� Materials (software tools) � AntConc, WordSmith Tools, COCA, BNC Web, … � TagAnt, Stanford POS Tagger, Stanford Parser, …
� Methods � concordancing, concordance plotting � clusters/n-gram analysis � collocation analysis � word/keyword analysis � statistical analysis/interpretation of the resu
� Step 5: Collect/analyze the data � Step 6: Discuss the results
Traveling to the future in corpus linguistics: Eye-Tracking Research
Eye-Tracking Research Detecting eye-fixations with EyeChat
76 EyeChat
Anthony, L. & Michel, M. (2017)
Eye-Tracking Research Detecting eye-fixations with EyeChat
� NS1 - fixated words (based on EyeChat) � 'Going overseas for university study is an exciting prospect
for many people. But while it may offer some advantages, it is probably better to stay home because of the difficulties a student inevitably encounters living and studying in a different culture.' To what extent do you agree or disagree with this statement? Give reasons for your answer and include any relevant examples from your knowledge or experience. Discuss with your partner about this topic.
oin rse for ver tudeop it
is ett to taynta
ficuvita
a
xte do aso any
eva mp rom wle or
cus ou bou
e.
for
77 EyeChat
Anthony, L. & Michel, M. (2017)
Eye-Tracking Research Detecting eye-fixations with EyeChat
� NNS1 - fixated words (according to EyeChat) � 'Going overseas for university study is an exciting prospect
for many people. But while it may offer some advantages, it is probably better to stay home because of the difficulties a student inevitably encounters living and studying in a different culture.' To what extent do you agree or disagree with this statement? Give reasons for your answer and include any relevant examples from your knowledge or experience. Discuss with your partner about this topic.
veroin erseop
tud is citi ospBut
an van
is obawhil may offe
fere
ficu aude
ett to om ecavinoun
gre or withag hisToou nd ncluem aso nsw
ow eri
vita
78 EyeChat
Anthony, L. & Michel, M. (2017)
Eye-Tracking Research Interactive text-enhancement with EyeChat
79
Text fixations highlighted and 'enhanced'
Chat fixations highlighted and 'enhanced'
EyeChat Anthony, L. & Michel, M. (2017)
Traveling to the future in corpus linguistics: Data-Mining and Network Analysis
Data-Mining and Network Analysis: FireAnt: Filter, Identify, Report, and Export Analysis Tool
81 FireAnt
Anthony, L. & Hardaker, C. (2017)
Data-Mining and Network Analysis: FireAnt: Filter, Identify, Report, and Export Analysis Tool
FireAnt Twitter collector in action
FireAnt Twitter authentication window
82 FireAnt
Anthony, L. & Hardaker, C. (2017)
Data-Mining and Network Analysis: FireAnt: Filter, Identify, Report, and Export Analysis Tool
FireAnt corpus analysis tools
83 FireAnt
Anthony, L. & Hardaker, C. (2017)
Data-Mining and Network Analysis: FireAnt: Filter, Identify, Report, and Export Analysis Tool
84 FireAnt
Anthony, L. & Hardaker, C. (2017)
Data-Mining and Network Analysis: FireAnt: Filter, Identify, Report, and Export Analysis Tool
FireAnt visualization output (exported to Gephi)
85 FireAnt
Anthony, L. & Hardaker, C. (2017)
Data-Mining and Network Analysis: FireAnt: Filter, Identify, Report, and Export Analysis Tool
Anthony, L. (2016). Looking from the past to the future in ESP through a corpus-based analysis of English for Specific Purposes journal titles. English Teaching and Learning Journal. 86
Analysis of research article titles in English for Specific Purposes
Data-Mining and Network Analysis: FireAnt: Filter, Identify, Report, and Export Analysis Tool
Measuring gains in an EMP course and the perspectives of language and medical educators as assessors Tracing the development of an emergent part-genre: The author summary Conceptual blending in legal writing: Linking definitions to facts Discourse structure and variation in manuscript reviews: Implications for genre categorization Evaluative language and interactive discourse in journal article highlights Developing rapport in inter-professional communication: Insights for international medical graduates Talking to learn: The hidden curriculum of a fifth-grade science class English as a lingua franca communication between domestic helpers and employers in Hong Kong: A study of pragmatic strategies Open-access writing: An investigation into the online drafting and revision of a research article in pure mathematics Metaphors in financial analysis reports: How are emotions expressed? To what extent is the Academic Vocabulary List relevant to university student writing? A response to “To what extent is the Academic Vocabulary List relevant to university student writing?” Analysing options in pedagogical business case reports: Genre, process and language Nominalization and grammatical metaphor: Elaborating the theory
87 Anthony, L. (2016). Looking from the past to the future in ESP through a corpus-based
analysis of English for Specific Purposes journal titles. English Teaching and Learning Journal.
Analysis of research article titles in English for Specific Purposes
Data-Mining and Network Analysis: FireAnt: Filter, Identify, Report, and Export Analysis Tool
88 Anthony, L. (2016). Looking from the past to the future in ESP through a corpus-based
analysis of English for Specific Purposes journal titles. English Teaching and Learning Journal.
Data-Mining and Network Analysis: Politics as seen through social media
http://www.worldatlas.com/articles/most-popular-social-media-networks-in-the-world.html 89
Data-Mining and Network Analysis: Politics as seen through social media
1st US Presidential Debate 2016
90
Data-Mining and Network Analysis: Politics as seen through social media
91
The anatomy of a tweet
Data-Mining and Network Analysis: Politics as seen through social media
� Tweet histories of Clinton and Trump � November 2015-May 2016
Trump tweets: 3205 Clinton tweets: 3211
92
Data-Mining and Network Analysis: Politics as seen through social media
93
Trump Tweet history (June 2016-June 2017) Total tweets: 3016 (no retweets)
Data-Mining and Network Analysis: Politics as seen through social media
Clinton Positive-Negative Network (1000 tweets 2016/05/18)
Data Analysis: FireAnt; User mentions > 0 (628 'friends') Rendering: Gephi 0.9.1; Force Atlas Layout (Default settings + 10000 repulsion) 94
Data-Mining and Network Analysis: Politics as seen through social media
Trump Positive-Negative Network (1000 tweets 2016/05/18)
Data Analysis: FireAnt; User mentions > 0 (360 'friends') Rendering: Gephi 0.9.1; Force Atlas Layout (Default settings + 10000 repulsion) 95
Summary and Questions: What next?
Summary and Questions � Understanding corpus linguistics
� What is corpus linguistics? � How did it advance between the 1960s and today? � What can it tell us about language and people and society?
� Doing corpus linguistics � How can we use corpus linguistics to inform us about…
� morphology, syntax, semantics, pragmatics, language variation � vocabulary analysis, second language learning, literature studies,
conversational analysis, and discourse analysis
� Traveling to the future in corpus linguistics � What new areas of corpus linguistics research are emerging? � What corpus data sources are being used? � What new corpus tools are being developed to help us?
97