Visualising Linguistic Evolutionin Academic Discourse
Verena Lyding, Ekaterina Lapshinova, Stefania Degaetano,Henrik Dittmann, Chris Culy
Joint Workshop of LINGVIS & UNCLHEACL-2012
Avignon, France
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 1 / 32
Overview
1 Introduction
2 Data to AnalyseLexico-grammatical FeaturesResources & Feature Extraction
3 Structured Parallel CoordinatesSPC VisualisationCustomisation and Interactive FeaturesVisual Analysis of Registers with SPC
4 Interpreting Visualisation ResultsCase Study I - changes in variable TENORCase Study II - changes in variable FIELD
5 Conclusion and Future Work
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 2 / 32
REGICO: Registers in Contact
Ekaterina Lapshinova
Stefania Degaetano
Elke Teich
FR 4.6 Applied Linguistics,Interpreting and Translation Studies
Saarland UniversitySaarbrücken
LinfoVis
Verena Lyding
Henrik Dittmann
Chris Culy
Institute for SpecialisedCommunication and
MultilingualismEURAC, Bozen-Bolzano
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 3 / 32
Introduction
Aimscreate procedures to visualise diachronic language changes inacademic discourse with the help of SPC, cf. (Culy et al., 2011)⇒to facilitate analysis and interpretation of complex data
Motivationstudy diachronic changes with focus on contact registerschanges are reflected by linguistic featureswe determine and describe tendencies of features, which mightbecome rarer, more frequent or cluster in new ways⇒the amount and complexity of the interrelated data
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 4 / 32
Data to Analyse Lexico-grammatical Features
Register Analysis
Registers are patterns of language according to use in contextcf. (Halliday&Hasan, 1989)
Linguistic variation according to contexts of use, with variablesfieldtenormode
cf. Systemic Functional Linguistics (SFL) and register theory, e.g.,(Quirk, 1985), (Halliday&Hasan, 1989) and (Biber, 1995),
Particular settings of these variables are associated with certainlexico-grammatical features⇒ co-occurrences indicate distinctive registers(e.g., the language of linguistics in academic discourse).
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 5 / 32
Data to Analyse Lexico-grammatical Features
Recent Language Change
changes incontexts of use(variables) andlanguage use(features)
⇒
features becomerarer ormore frequent,and clusterin novel ways
⇒existing registersbecome obsolete,new ones evolve
cf. (Mair, 2006): changes in preferences of lexico-grammaticalselection in English in the 1960s vs. the 1990s.
Our focus: new registers that evolve in contact of disciplines (e.g.the language of bioinformatics, a contact register to biology andcomputer science)
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 6 / 32
Data to Analyse Lexico-grammatical Features
Case Study I - changes in variable tenor
TENOR: modalitymodal verbs grouped according to (Biber, 1999):obligation, permission and volition
categories of meaning (feature) realisationobligation/necessity can, could, may, etc.permission/possibility/ability must, should, etc.volition/prediction will, would, shall, etc.
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 7 / 32
Data to Analyse Lexico-grammatical Features
Case Study II - changes in variable field
FIELD: verb valency patternsCompeting grammatical variants, e.g. valency patterns show thetrends in the development of grammatical features, cf. (Mair, 2006)
valency patterns (feature) exampleVERB+inf help do sth.VERB+obj+inf help sb. do sth.VERB+to-inf help to do sth.
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 8 / 32
Data to Analyse Resources & Feature Extraction
SciTex
DaSciTex:from the early 2000s
- approx. 17 million words
SaSciTex:from the 1970s/early 1980s
- approx. 17 million words
COMPUTERSCIENCE
(A)
LINGUISTICS(C1)
COMPUTA
TIONAL
LINGUIS
TICS
(B1)
BIOLOGY(C2)
BIO-
INFORMATICS
(B2)
ELECTRICALENGINEERING
(C4)
MICRO-
ELECTRONICS
(B4)
MECHANICALENGINEERING
(C3)
DIGIT
AL
CONSTRUCTION
(B3)
cf. (Degaetano et al., 2012) and (Teich&Fankhauser, 2010)
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 9 / 32
Data to Analyse Resources & Feature Extraction
Corpus Annotations
automatic token, lemma, part-of-speech, chunktext register, text year, division, etc. (metadata)
semi-automatic cohesive devices, evaluative patterns
manual transitivity
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 10 / 32
Data to Analyse Resources & Feature Extraction
Extractions From Corpora
with the Corpus Query Processor (CQP), cf. (Evert 2005)
Positional Attributes: wordposlemma
Structural Attributes: stexttext_titletext_authortext_yeartext_adcohesioncohesion_devicemodalmodal_meaningevaluationevaluation_pattern
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 11 / 32
Data to Analyse Resources & Feature Extraction
Examples of Extraction
Case I Extraction: Modal Menaings
Query building blocks comments sentences extracted from SciTex
context Each edge[_._modal_meaning=”obligation"] category of obligation can[pos="V.*"] verb transmit
context a single packet in each time stepcontext S
[_._modal_meaning=”permission"] category of permission must[pos="V.*"] verb remove
context at least bj jobscontext We
[_._modal_meaning=”volition"] category of volition shall[pos="V.*"] verb use
context s adversary trees
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 12 / 32
Data to Analyse Resources & Feature Extraction
Examples of Extraction
Case II Extraction: Valency Patterns
Query building blocks comments sentences extracted from SciTex
context It also The poweravailable withthe system
Lemma 1
[pos=”V.*"&lemma=”help"] verb helphelps helps helps
[pos=”TO"]? optional toto
( object start[pos=”DT|PP|PDT"]? one or none determiner
the[pos=”RB.*|JJ.*|VVN|N.*"]{0,3} up to 3 modifiers[pos=”POS"]? one or none possessive[pos=”N.*|PP"]? noun or pronoun
programmer) object end[pos=”V(V|B|H)"] infinitive
organise refrain setcontext routine review
of recordingsfrom resistingchanges
the inductivebasis for k
⇓ ⇓ ⇓valency patterns
VERB+inf VERB+obj+inf VERB+to-inf
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 13 / 32
Data to Analyse Resources & Feature Extraction
Extraction Output
Preparation for Analysisextracted material is sorted according to registersdata is transformed into JSON format for input to SPC
Analysis Aimsregister analysis of A-B-C triples:⇒whether B disciplines are more similar to A or C or distinct frombothdiachronic analysis:⇒two time periods in SciTex (70/80s vs. 2000s)a more fine-grained diachronic analysis: publication year
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 14 / 32
Structured Parallel Coordinates SPC Visualisation
Structured Parallel Coordinates (SPC)
SPC (Culy et al. 2011) are a specialisation of the Parallel Coordinatesvisualisation (d’Ocagne 1885; Inselberg 1985, 2009)
The Parallel Coordinates visualisation provides:two-dimensional representation of multidimensional datadata dimensions on vertical axes, lined up horizontallyrelated data points are connected by colored lines between axes
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 15 / 32
Structured Parallel Coordinates SPC Visualisation
Parallel CoordinatesExample visualising car features
Taken from protovis page:http://mbostock.github.com/protovis/ex/cars.html
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 16 / 32
Structured Parallel Coordinates SPC Visualisation
SPC for language data
Adaptation of Parallel Coordinates to accomodate language data,e.g. as derived from corpora
customised for representing ordered characteristics within andacross dimensions
- e.g. in the n-grams with frequencies application of SPC, orderedaxes represent the linear ordering of words in text
- e.g. visual separation of ordered and unordered axesrefined modes of interaction
- e.g. non-contiguous selection of values
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 17 / 32
Structured Parallel Coordinates SPC Visualisation
N-grams with Frequencies applicationPronouns used with happy and sad
⇒ It is sad > It’s sad > One was sad > It was sad > We were sadV.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 18 / 32
Structured Parallel Coordinates Customisation and Interactive Features
SPC for analysing language change
In SPC data dimensions are placed on different axessubcorpus characteristics,lexico-grammatical features, andtheir frequencies.
Numerical axes are ordered according to time/register of thesubcorpus
i.e. corpus from the 1970/80s→ corpus 2000si.e. computer science→ mixed-discipline (e.g. bioinformatics)→specialised discipline (e.g. biology)
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 19 / 32
Structured Parallel Coordinates Customisation and Interactive Features
Subcorpus Comparisons - adjustments
For analysing language change:changes in linguistic features over timechanges in linguistic features across registers
SPC subcorpus comparison applicationbased on n-grams with frequencies applicationordered numerical axes follow (unordered) categorical axesdiscrete line coloring for the distinction of categorical variablesswitching between comparable and individual numerical scales
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 20 / 32
Structured Parallel Coordinates Customisation and Interactive Features
SPC subcorpus comparison applicationVisualising linguistic features by register and time
Visualisation of HELP plus complements by register
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 21 / 32
Structured Parallel Coordinates Customisation and Interactive Features
Visual analysis of registers
The SPC visualisation allows for:display of multidimensional datadynamic interaction with the data
- comparable vs. individual numerical scales- discrete vs. scaled coloring of lines→ OVERVIEW
- selection of data points for dynamic filtering- line coloring according to axis in focus→ FOCUS
- highlighting of axes on mouseover- written summary of the record→ DETAILS
⇒ support for the detection of patterns, tendencies and outliers
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 22 / 32
Interpreting Visualisation Results Case Study I - changes in variable TENOR
Analysing modal meanings
Investigation of changes in usage: obligation, permission and volition
⇒ DEMO:
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 23 / 32
Interpreting Visualisation Results Case Study I - changes in variable TENOR
Modal Meanings by Time
selection of permission, focus on registers:remarkably less increase for some data sets→ Electrical Engineering domain (C4)selection of single disciplines, focus on registers :
- (A-B1-C1, Linguistics):B is closer to C than to A for all modal meanings in the 1970/80s
- (A-B2-C2, Biology):no remarkable differences in tendency for volition; strongerdecrease in C than in A and B for obligation; for permissionincrease in B lies between increase for C and A
Link: www.eurac.edu/linfovis> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 24 / 32
Interpreting Visualisation Results Case Study I - changes in variable TENOR
Modal Meanings by Register
focus on time: modal meanings behave similarily over time- detailed analysis with selection of single modal meanings:
obligation: strongest decrease for B3 to C3;strongest increase for B1-C1 and B2-C2permission: strongest decrease for B1 to C1;strongest increase for B3-C3volition: strongest decrease for B2 to C2
- detailed analysis with selection of Biology:e.g. focus on obligation: contrary tendencies for B and C over time
focus on registers, normalised values:C values remained stable, B values decreased over time
Link: www.eurac.edu/linfovis> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 25 / 32
Interpreting Visualisation Results Case Study II - changes in variable FIELD
Analysing verb complements
Investigation of changes in usage: HELP plus complements
⇒ DEMO:
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 26 / 32
Interpreting Visualisation Results Case Study II - changes in variable FIELD
HELP plus Complements by Time
focus on verbs:frequency ordering of verb constructions for all registers:HELP+To+Inf HELP+Inf (1970/80s) ≥ HELP+Obj+Inf (catching upin 2000s)selection of HELP+To+Inf, focus on disciplines:increase over time for B3 (Mechanical), decrease in A and B4(Electrical), moderate changes in other disciplinesselection of HELP+Inf and HELP+Obj+Inf, focus on verbs:some distinct tendencies:
- A, B3 and B4/C4 strongly increasing- B1/C1 and B2/C2 are changing moderately
Link: www.eurac.edu/linfovis> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 27 / 32
Interpreting Visualisation Results Case Study II - changes in variable FIELD
HELP plus Complements by Register
focus on verbs:HELP+Inf behaves most uniformly over all registersselection of HELP+Inf, focus on time:relative stable over subdisciplines, differences between B2 and C2selection of HELP+Obj+Inf, focus on disciplines:relative occurrences in B3 and C3 inversed from 70/80s to 2000sregisters layed out in detail, focus on verbs
- B3/C3 show inversed tendencies over time for HELP+Obj+Inf andless for HELP+To+Inf
- B4/C4 show relative stability over time periods for all verbconstructions
Link: www.eurac.edu/linfovis> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 28 / 32
Conclusion and Future Work
Conclusion
We could show thatvisualisation allows to gain an overview and detect tendencies→ complex set of data in one displayinteractive features allow to dynamically focus on different aspectsof the data→ filtering and highlighting of specific subsets for detailedanalyses
⇒SPC facilitate our diachronic register analysis
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 29 / 32
Conclusion and Future Work
Future Work
Data Analysisuse different data layouts to feed several SPC visualisationsfocus on further features for the three contextual variablese.g., conjunctive relations expressing cohesion for mode.analyse several linguistic features at the same time(feature sets for register variation)provide a more fine-grained diachronic analysis(by publication years)
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 30 / 32
Conclusion and Future Work
Future Work continued
Technical Enhancementsfunction for automatic restructuring the underlying data to createdifferent layouts, e.g. the merging of axes with categorical values(e.g., axes registers and disciplines)introduction of a ’summary’ category on each data dimension(the sum of all individual values)function for selecting data items based on crossings or declinationof their connecting lineschanging the visualisation of overlapping lines(e.g. using semi-transparent or stacked lines)
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 31 / 32
Thank you!Questions? Comments? Suggestions?
[email protected]@mx.uni-saarland.de
www.eurac.edu/linfovis
V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 32 / 32