redundancy ratio: an invariant property of the consonant inventories of the world’s languages
Post on 01-Jan-2016
17 Views
Preview:
DESCRIPTION
TRANSCRIPT
Redundancy Ratio: An Invariant Property of the Consonant Inventories
of the World’s Languages
Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly
Department of Computer Science & Engg.
Indian Institute of Technology, Kharagpur
Redundancy in Natural Systems
Reduce the risk of information loss – fault tolerance
Examples of redundancy:
Biological systems – Codons, genes, proteins etc.
Linguistic systems – Synonymous words
Human Brain – Perhaps the biggest example of neuronal redundancy
Redundancy in Sound SystemsLike any other natural system, human speech
sound systems are expected to show redundancy in the information they encode
In this work we attempt to
Mathematically formulate this redundancy, and,
Unravel the interesting patterns (if any) that results from this formulation
Feature Economy: An age-old PrincipleSounds, especially consonants, tend to occur in
pairs that are highly correlated in terms of their features
Languages tend to maximize combinatorial possibilities of a few features to produce many consonants
If a language has in its inventory
then it will also tend to have
voiced voiceless
bilabial
dental
/b/ /p/
/d/ /t/
plosive
Mathematical Formulation We use the concepts of information theory to quantify
feature economy (assuming features are Boolean)
The basic idea is to compute the number of bits req-uired to pass the information of an inventory of size N over a transmission channel
Ideal Scenario
Noiseless ChannelInventory of Size N Info. Undistorted
log2N bits are required for lossless transmission
Mathematical Formulation We use the concepts of information theory to quantify
feature economy (assuming features are Boolean)
The basic idea is to compute the number of bits req-uired to pass the information of an inventory of size N over a transmission channel
General Scenario
Noisy ChannelInventory of Size N Info. Distorted
> log2N bits are required for lossless transmission
Feature EntropyThe actual number of bits required can be
estimated by calculating the binary entropy as followspf – number of consonants in the inventory in
which feature f is presentqf – number of consonants in the inventory in
which feature f is absentThe probability that a consonant chosen at random
form the inventory has f is and that is does not have f is (=1- )
pf
Nqf
N
pf
N
Feature Entropy
If F denote the set of all features,
FE= –∑fєF log2 + log2
Redundancy Ratio (RR)
RR =
The excess number of bits required to represent the inventory
pf
Npf
N
qf
Nqf
N
FE
log2N
Example
Experimentation
Data Source
UCLA Phonological Inventory DatabaseSamples data uniformly from almost all linguistic
familiesHosts phonological systems of 317 languagesNumber of Consonants: 541Number of Vowels: 151
RR: Consonant Inventories
1
2
3
4
5
6
7
0 5 10 15 20 25 30 35 40
The slope of the line fit is -0.0178 RR is almost invariant with respect to the inventory size
The result means that consonant inventories are organized to have similar redundancy irrespective of their size important because no such explanation yet Inventory Size
Red
und
ancy
Rat
io
The Invariance is not “by chance”
The invariance in the distribution of RRs for consonant inventories did not emerge by chance
Can be validated by a standard test of hypothesis
Null Hypothesis: The invariance in the distribution of RRs observed across the real consonant inventories is also prevalent across the randomly generated inventories.
Generation of Random InventoriesModel I – Purely random model
The distribution of the consonant inventory size is assumed to be known a priori
Conceive of 317 bins corresponding to the languages in UPSID Pick a bin and fill it by randomly choosing consonants (without
repetition) from the pool of 541 available consonants Repeat the above step until all the bins are packed
/p/
/b/
/d/
/k/
4
/p//g/
/d/
/t/
6
/n//m/
/d//t/ /n/ /b/ /p//k/ /m/ ………………
……………………………………………..
Bin 1 Bin 2 Bin 317
2
/p/ /n/
Pool of phonemes
Fill randomly
Model II – Random model based on Occurrence Frequency
For each consonant c let the frequency of occurrence in UPSID be denoted by fc.
Let there be 317 bins each corresponding to a language in UPSID.
fc bins are then chosen uniformly at random and the consonant c
is packed into these bins without repetition.
Generation of Random Inventories
/p/
/b/
/d/
/k/
/p//g/
/d/
/t//n//m/
……………………………………………..
Bin 1 Bin 2 Bin 317
/p/ /n/
/t/ (25) /n/ (12) /p/ (100) …………………….
Pool of phonemes
/t/Choose 25 bins randomly and fill with /t/
Results Model I – t-test
indicates that the null hypothesis can be rejected with (100 - 9.29e-15)% confidence
Model II – Once again in this case t-test shows that the null hypothesis can be rejected with (100–2.55e–3)% confidence
Occurrence frequency governs the organization of the consonant inventories at least to some extent
1
2
3
4
5
5 15 25 35 45
Inventory Size
Ave
rage
Red
und
ancy
Rat
io Model IModel IIReal
The Case of Vowel Inventories
The slope of the line fit is -0.125
For small inventories RR is not invariant while for Larger ones (size > 12) it is so
Smaller inventories perceptual contrast and Larger inventories feature economy
t-test shows that we can be 99.93% confident that the two inventories aredifferent in terms of RR
Inventory Size
Red
und
ancy
Rat
io
1
2
3
4
5
6
7
0 5 10 15 20 25 30 35 40
Vowels
Consonants
Error Correcting Capability
For most of the consonant inventories the average hamming distance between two consonants is 4 1 bit error correcting capability
Vowel inventories do not indicate any such fixed error correcting capability
0
1
2
3
4
5
6
7
0 10 20 30 40
0
1
2
3
4
5
6
7
0 10 20 30 40
Consonants
Vowels
Inventory Size
Ave
rage
Ham
min
g D
ista
nce
Conclusions
Redundancy ratio is almost an invariant property of the consonant inventories with respect to the inventory size,
This invariance is a direct consequence of the fixed error correcting capabilities of the consonant inventories,
Unlike the consonant inventories, the vowel inventories are not indicative (at least not all of them) of such an invariance.
Discussions Cause of the origins of redundancy in a linguistic
system
Fault tolerance: Redundancy acts as a failsafe mechanism against random distortion
Evolutionary Cause: Redundancy allows a speaker to successfully communicate with speakers of neighboring dialects – “Linguistic junk” as pointed out by Lass (Lass, 1997)
Děkuji
top related