redundancy ratio: an invariant property of the consonant inventories of the world’s languages

20
Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly Department of Computer Science & Engg. Indian Institute of Technology,

Upload: garrett-duke

Post on 01-Jan-2016

17 views

Category:

Documents


1 download

DESCRIPTION

Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages. Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly Department of Computer Science & Engg. Indian Institute of Technology, Kharagpur. Redundancy in Natural Systems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Redundancy Ratio: An Invariant Property of the Consonant Inventories

of the World’s Languages

Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly

Department of Computer Science & Engg.

Indian Institute of Technology, Kharagpur

Page 2: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Redundancy in Natural Systems

Reduce the risk of information loss – fault tolerance

Examples of redundancy:

Biological systems – Codons, genes, proteins etc.

Linguistic systems – Synonymous words

Human Brain – Perhaps the biggest example of neuronal redundancy

Page 3: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Redundancy in Sound SystemsLike any other natural system, human speech

sound systems are expected to show redundancy in the information they encode

In this work we attempt to

Mathematically formulate this redundancy, and,

Unravel the interesting patterns (if any) that results from this formulation

Page 4: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Feature Economy: An age-old PrincipleSounds, especially consonants, tend to occur in

pairs that are highly correlated in terms of their features

Languages tend to maximize combinatorial possibilities of a few features to produce many consonants

If a language has in its inventory

then it will also tend to have

voiced voiceless

bilabial

dental

/b/ /p/

/d/ /t/

plosive

Page 5: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Mathematical Formulation We use the concepts of information theory to quantify

feature economy (assuming features are Boolean)

The basic idea is to compute the number of bits req-uired to pass the information of an inventory of size N over a transmission channel

Ideal Scenario

Noiseless ChannelInventory of Size N Info. Undistorted

log2N bits are required for lossless transmission

Page 6: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Mathematical Formulation We use the concepts of information theory to quantify

feature economy (assuming features are Boolean)

The basic idea is to compute the number of bits req-uired to pass the information of an inventory of size N over a transmission channel

General Scenario

Noisy ChannelInventory of Size N Info. Distorted

> log2N bits are required for lossless transmission

Page 7: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Feature EntropyThe actual number of bits required can be

estimated by calculating the binary entropy as followspf – number of consonants in the inventory in

which feature f is presentqf – number of consonants in the inventory in

which feature f is absentThe probability that a consonant chosen at random

form the inventory has f is and that is does not have f is (=1- )

pf

Nqf

N

pf

N

Page 8: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Feature Entropy

If F denote the set of all features,

FE= –∑fєF log2 + log2

Redundancy Ratio (RR)

RR =

The excess number of bits required to represent the inventory

pf

Npf

N

qf

Nqf

N

FE

log2N

Page 9: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Example

Page 10: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Experimentation

Data Source

UCLA Phonological Inventory DatabaseSamples data uniformly from almost all linguistic

familiesHosts phonological systems of 317 languagesNumber of Consonants: 541Number of Vowels: 151

Page 11: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

RR: Consonant Inventories

1

2

3

4

5

6

7

0 5 10 15 20 25 30 35 40

The slope of the line fit is -0.0178 RR is almost invariant with respect to the inventory size

The result means that consonant inventories are organized to have similar redundancy irrespective of their size important because no such explanation yet Inventory Size

Red

und

ancy

Rat

io

Page 12: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

The Invariance is not “by chance”

The invariance in the distribution of RRs for consonant inventories did not emerge by chance

Can be validated by a standard test of hypothesis

Null Hypothesis: The invariance in the distribution of RRs observed across the real consonant inventories is also prevalent across the randomly generated inventories.

Page 13: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Generation of Random InventoriesModel I – Purely random model

The distribution of the consonant inventory size is assumed to be known a priori

Conceive of 317 bins corresponding to the languages in UPSID Pick a bin and fill it by randomly choosing consonants (without

repetition) from the pool of 541 available consonants Repeat the above step until all the bins are packed

/p/

/b/

/d/

/k/

4

/p//g/

/d/

/t/

6

/n//m/

/d//t/ /n/ /b/ /p//k/ /m/ ………………

……………………………………………..

Bin 1 Bin 2 Bin 317

2

/p/ /n/

Pool of phonemes

Fill randomly

Page 14: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Model II – Random model based on Occurrence Frequency

For each consonant c let the frequency of occurrence in UPSID be denoted by fc.

Let there be 317 bins each corresponding to a language in UPSID.

fc bins are then chosen uniformly at random and the consonant c

is packed into these bins without repetition.

Generation of Random Inventories

/p/

/b/

/d/

/k/

/p//g/

/d/

/t//n//m/

……………………………………………..

Bin 1 Bin 2 Bin 317

/p/ /n/

/t/ (25) /n/ (12) /p/ (100) …………………….

Pool of phonemes

/t/Choose 25 bins randomly and fill with /t/

Page 15: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Results Model I – t-test

indicates that the null hypothesis can be rejected with (100 - 9.29e-15)% confidence

Model II – Once again in this case t-test shows that the null hypothesis can be rejected with (100–2.55e–3)% confidence

Occurrence frequency governs the organization of the consonant inventories at least to some extent

1

2

3

4

5

5 15 25 35 45

Inventory Size

Ave

rage

Red

und

ancy

Rat

io Model IModel IIReal

Page 16: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

The Case of Vowel Inventories

The slope of the line fit is -0.125

For small inventories RR is not invariant while for Larger ones (size > 12) it is so

Smaller inventories perceptual contrast and Larger inventories feature economy

t-test shows that we can be 99.93% confident that the two inventories aredifferent in terms of RR

Inventory Size

Red

und

ancy

Rat

io

1

2

3

4

5

6

7

0 5 10 15 20 25 30 35 40

Vowels

Consonants

Page 17: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Error Correcting Capability

For most of the consonant inventories the average hamming distance between two consonants is 4 1 bit error correcting capability

Vowel inventories do not indicate any such fixed error correcting capability

0

1

2

3

4

5

6

7

0 10 20 30 40

0

1

2

3

4

5

6

7

0 10 20 30 40

Consonants

Vowels

Inventory Size

Ave

rage

Ham

min

g D

ista

nce

Page 18: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Conclusions

Redundancy ratio is almost an invariant property of the consonant inventories with respect to the inventory size,

This invariance is a direct consequence of the fixed error correcting capabilities of the consonant inventories,

Unlike the consonant inventories, the vowel inventories are not indicative (at least not all of them) of such an invariance.

Page 19: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Discussions Cause of the origins of redundancy in a linguistic

system

Fault tolerance: Redundancy acts as a failsafe mechanism against random distortion

Evolutionary Cause: Redundancy allows a speaker to successfully communicate with speakers of neighboring dialects – “Linguistic junk” as pointed out by Lass (Lass, 1997)

Page 20: Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages

Děkuji