a nalyzing the localization of language features with c omplex s ystems t ools and predicting...
TRANSCRIPT
ANALYZING THE LOCALIZATION OF LANGUAGE FEATURES WITH COMPLEX SYSTEMS TOOLS AND PREDICTING LANGUAGE VITALITY
Samuel OmlinUniversity of Lausanne, Switzerland ([email protected])
INTERNATIONAL CONFERENCE “COGNITIVE MODELING IN LINGUISTICS”CML-2010, Dubrovnik (Croatia)
Romansh – an endangered language
“Allegra, miu num ei Alfons Camiu. Jeu vivel en la biala Swizzera. Per discletg sundel jeu in dilsdavos 35'000 che discuoren aunc bein romontsch. Denton ei il prighel fetg gronds, che quei bi lungatg sto murir.”
Romansh – an endangered language
“Hello. My name is Alfons Camiu. I live in Switzerland. I am one of the 35'000 people who speak Romansh as their native language. Unfortunately, the language of my people is in danger of dying out.”
Half of the languages are endangered
Language competition
Business doctrine: location, location,
location
Can this doctrine be applied to the survival of
languages?
Literature study
The role played by the geographic situation of
a language in its ultimate survival, and in
particular the role played by the linguistic
structure of the languages neighboring it, is
still unclear.
Literature study
Inevitable extinction of minority languages in
competition with stronger ones or possibility
for stable coexistence under certain
circumstances?
Unesco: assessing language vitality
In no criteria geography was directly implied:
Focus
Relation between the vitality of a minority
language and the linguistic structure of the
languages neighboring it?
Method
Adaptation of a mathematical method, having
its origins in the economical sciences and
identifying optimal localizations to implement
commercial stores with empirical success
Presentation Outline
Identifying optimal business locations
Measuring the spatial distribution of linguistic features
Predicting language vitality
Modeling and sample
Results and conclusions
Presentation Outline
Identifying optimal business locations
Measuring the spatial distribution of linguistic features
Predicting language vitality
Modeling and sample
Results and conclusions
Modeling and sample
Modeling and sample
Sample summary
• 105 living languages
• 186 linguistic communities in Eurasia with
independent vitality
• 31 of these linguistic communities have associated a
vitality grade
Presentation Outline
Identifying optimal business locations
Measuring the spatial distribution of linguistic features
Predicting language vitality
Modeling and sample
Results and conclusions
M index
Quantifies the geographic aggregation and
dispersion tendencies of pairs of categories
of stores
Imaginary city: 16 Stores
Legend
ButcherBakery
Other store
Imaginary city: 16 Stores
Legend
ButcherBakery
Other store
Do butcher stores “attract” bakeries?
Step 1: definition of neighborhood
Legend
Butcher (A) Bakery (B)Other store
Draw a disk of radius r (100m) around each store (s) of
category A.
Step 2
Legend
Butcher (A) Bakery (B)Other store
Pick a store (s) of category A.
s1
Step 3
Legend
Butcher (A) Bakery (B)Other store
Count the total number of stores in its neighborhood: n(s);
n(s1) = 3
s1
Step 4
Legend
Butcher (A) Bakery (B)Other store
…count the number of B stores in its neighborhood: nB(s);n(s1) = 3
nB(s1) = 2
s1
Step 5
Legend
Butcher (A) Bakery (B)Other store
…compute the local concentration of B stores in its neighborhood:
. n(s1) = 3
nB(s1) = 2
= 2/3
.
s1
Step 6
Legend
Butcher (A) Bakery (B)Other store
Then, count the total number of stores in the entire city: N;
n(s1) = 3
nB(s1) = 2
= 2/3
N = 16
.
s1
Step 7
Legend
Butcher (A) Bakery (B)Other store
…count the number of B stores in the entire city: NB;n(s1) = 3
nB(s1) = 2
= 2/3
N = 16
NB = 4
.
s1
Step 8
Legend
Butcher (A) Bakery (B)Other store
…compute the overall concentration of B stores in the entire
city: . n(s1) = 3
nB(s1) = 2
= 2/3
N = 16
NB = 4
= 1/4
.
s1
Step 9
Legend
Butcher (A) Bakery (B)Other store
Compare the local concentration of B stores with its overall concentration:
n(s1) = 3
nB(s1) = 2
= 2/3
N = 16
NB = 4
= 1/4
= =
.
s1
Step 10
Legend
Butcher (A) Bakery (B)Other store
Compute this ratio also for all the other A stores in the
city. n(s2) = 6
nB(s2) = 2
= 1/3
N = 16
NB = 4
= 1/4
= =
.
s1
s2
Step 11
Finally, compute the average of this ratio over all A
stores in the city:
For our example (A: butcher; B: bakery):
.
Imaginary city: answer
Do butcher stores (A) “attract” bakeries
(B)?MAB = 2: next to the butcher stores, the local
concentration of bakeries is on average two times higher than the overall concentration. => Butcher stores tend to “attract” bakeries.
M index interpretation
Under pure randomness hypothesis E[MAB]=1 for all
r > 0.
=> MAB allows quantifying deviations from purely
random configurations:
MAB > 1: A tends to “attract” B
MAB < 1: A tends to “repulse” B
.
Location quality
Location “quality” index for a commercial
activity A at a point (x,y): essentially
represents the sum of all quantified
attraction and repulsion tendencies from the
stores in the point’s neighborhood
Presentation Outline
Identifying optimal business locations
Measuring the spatial distribution of linguistic features
Predicting language vitality
Modeling and sample
Results and conclusions
Adapted M index
Quantifies tendencies of typological language
features to aggregate or disperse
Adapted M index
A: “2.5.3.SIMPLE SENTENCE -> marginal constructions -> Affective”B: “2.1.4.SYLLABLE -> the element following the vowel -> not more than one consonant”
Neighborhood of a linguistic community
Defined as: the set of communities overlapping its
area, enlarged by a buffer of size r (1 degree ≈
110 km)
Particularity
Determination of the concentration of a
feature: adding numbers of speakers rather
than simply counting communities
Example
Does feature A “attract” feature B?
Example: answer
Does feature A “attract” feature B?MAB ≈ 0.001: next to speakers manifesting feature A,
the local concentration of speakers using feature B is on average about a thousandth of the overall concentration. => Feature A tends to “repulse” feature B.
Presentation Outline
Identifying optimal business locations
Measuring the spatial distribution of linguistic features
Predicting language vitality
Modeling and sample
Results and conclusions
Location quality
Location quality of a feature: average
ability of a feature to coexist with the
features manifested by the communities in
its neighborhood
Location quality of a linguistic
community: aggregated location quality
indexes of its features
Predicting language vitality
For the 31 minority communities for which I
could associate a vitality, I put it in relation
to the corresponding location quality.
Presentation Outline
Identifying optimal business locations
Measuring the spatial distribution of linguistic features
Predicting language vitality
Modeling and sample
Results and conclusions
Location quality and vitality
Spearman’s rang correlation: 0.62 (p-value: 0.00009)
Conclusions
• The degree of endangerment of the
considered minority languages seems
effectively related to the linguistic structure
of their neighboring languages.
Conclusions
• It has been outlined how to join
- Jaziky mira
- World Language Mapping System
- Atlas of the World’s Languages in Danger
in order to conduct quantitative linguistic
studies when geographic parameters are
involved.
Conclusions
• The first study to integrate realistic linguistic features
in order to describe languages in competition
Conclusions
• The approach constitutes a promising tool to
gain more knowledge about the mechanisms
that control the geographical distribution of
linguistic features.
Acknowledgement
• Dr Vladimir Polyakov, organizing
committee
• Professor Valery Solovyev, organizing
committee
• Dr Søren Wichmann, Department of
Linguistics of the Max Planck Institute for
Evolutionary Anthropology, Germany
Support (1/2)
• Dr Aris Xanthos, section of
Linguistics, section of Information
Technologies and Mathematical
Methods, University of Lausanne
(UNIL), Switzerland
• Professor François Golay and
Dr Stéphane Joost, Geographic
Information Systems Laboratory
(LASIG), Swiss Federal Institute of
Technology Lausanne (EPFL),
Switzerland
Support (2/2)
• Professor Pablo Jensen,
Laboratory of Physics, French
National Center for Scientific
Research (CNRS), France
• Professor François Pellegrino and
Dr Fermín Moscoso del Prado
Martín, 'Dynamique Du Langage‘
Laboratory, French National Center
for Scientific Research (CNRS),
France
References (1/3)
Jazyki mira (Languages of the World) (1993-2004). Moscow: Academia & Indrik. [Online]. Available: http://ww.dblang.ru/en
Jensen, P. (2006). Network-based predictions of retail store commercial categories and optimal locations. Phys. Rev. E 74(3), 035101(R). [Online]. Available: http://dx.doi.org/10.1103/PhysRevE.74.035101
References (2/3)
Jensen, P. (2009). Analyzing the Localization of Retail Stores with Complex Systems Tools. In Adams, N. M., Robardet, C., Siebes, A. & Boulicaut, J-F. (Eds.), Advances in Intelligent Data Analysis VIII: 8th International Symposium on Intelligent Data Analysis, Lecture Notes in Computer Science, Vol. 5772/2009, 10–20. Berlin Heidelberg: Springer-Verlag. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-03915-7_2
References (3/3)
Moseley, C. (Ed.) (2009). Atlas of the World’s Languages in Danger. Unesco. [Online]. Available: http://www.unesco.org/culture/en/endangeredlanguages/atlas
World Language Mapping System (2010). Global Mapping International & SIL International. [Online]. Available: http://www.gmi.org/wlms
Additional slides
…
Step 1: definition of neighborhood
Draw the neighborhood of every community (c) manifesting
feature A.
Step 2
Pick a community (c) manifesting feature A.
Step 3
Add up the number of speakers of all communities in its
neighborhood: n(c); n(c) ≈ 54 million
Step 4
…add up the number of speakers of the communities manifesting the feature B in its neighborhood: nB(c);
n(c) ≈ 54 million
nB(c) = 0
Step 5…compute the local concentration of communities manifesting
feature B in its neighborhood: .
n(c) ≈ 54 million
nB(c) = 0
= 0
Step 6
Then, add up the number of speakers of all communities in the entire sample region: N;
n(c) ≈ 54 million
nB(c) = 0
= 0
N ≈ 931 million
Step 7
…add up the number of speakers of all communities manifesting feature B in the entire sample region: NB;
n(c) ≈ 54 million
nB(c) = 0
= 0
N ≈ 931 million
NB ≈ 93 million
Step 8… compute the overall concentration of communities
manifesting feature B in the entire sample region: .
n(c) ≈ 54 million
nB(c) = 0
= 0
N ≈ 931 million
NB ≈ 93 million
≈ 1/10
Step 9Compare the local concentration of communities
manifesting feature B with its overall concentration:
n(c) ≈ 54 million
nB(c) = 0
= 0
N ≈ 931 million
NB ≈ 93 million
≈ 1/10
≈ ≈ 0
.
Step 10
Compute this ratio also for all the other communities manifesting feature B in the sample region.
Computers work…
Step 11
Finally, compute the average of this ratio over all
communities manifesting feature A in the sample region
(the average is weighted by their number of speakers):
For our example:
MAB ≈ 0.001
.
Example: answer
Does feature A “attract” feature B?MAB ≈ 0.001: next to speakers manifesting feature A,
the local concentration of speakers using feature B is on average about a thousandth of the overall concentration. => Feature A tends to “repulse” feature B.
Interpretation of the adapted M index
Under pure randomness hypothesis E[MAB]=1 for all
r > 0.
=> the adapted MAB allows quantifying deviations
from purely random configurations like Jensen’s MAB :
MAB > 1: A tends to “attract” B
MAB < 1: A tends to “repulse” B
.
Coexistence ability of features
The spatial distribution of the commercial
activities seems to unravel interactions that
favor or disfavor successful local
coexistence of certain activities.
From the spatial distribution of language
features only can not be directly
determined which features can
successfully coexist and which can not.
Coexistence ability of features
We can quantify interactions favoring or dis-
favoring successful coexistence between
features by considering only communities
that are probably not endangered when
computing the M index.
C index: coexistence ability of features
where the considered linguistic communities
are only the ones that are probably not
endangered.
Method
City: a heterogeneous geographic space (with
parks, streams etc.) giving home to a
network of commercial activities
World: a heterogeneous geographic space
(with sea, mountains, lakes etc.) giving home
to a network of languages, or more precisely,
of linguistic features