mutual intelligibility and similarity of chinese dialects...a map showing the geographic...

12
Linguistics in the Netherlands 2007, 223–234. issn 0929–7332 / e-issn 1569–9919 © Algemene Vereniging voor Taalwetenschap Mutual intelligibility and similarity of Chinese dialects Predicting judgments from objective measures Chaoju Tang and Vincent J. van Heuven Chongqing Jiaotong University / University of Leiden/LUCL 1. Introduction 1.1 Why study mutual intelligibility? Distance between languages is used as a criterion when arguing about genealogi- cal relationships between languages. e more languages resemble each other, the more likely they are derived from the same parent language, i.e., belong to the same language family. However, it is difficult to quantify the distance between languages one-dimensionally since languages differ along many structural dimensions (e.g. phonetics, phonology, morphology, syntax). It is unclear how the various dimen- sions should be weighed against each other. erefore, we select a single criterion — mutual intelligibility. Mutual intelligibility is an overall criterion that may tell us in a psychologically relevant way whether two languages are similar/close. Useful work on structural measures of difference between related languages has been done, for instance, at Stanford University (for Gaelic Irish dialects, cf. Kessler 1995) and at the University of Groningen (for Dutch and Norwegian dia- lects, Heeringa 2004), using the Levenshtein distance measure. is is a similarity metric that computes the mean number of string operations needed to convert the phonetic transcription of a word in one language (variety) to its counterpart in another language (variety). is measure was then used to build a tree structure (through hierarchical cluster analysis) which largely matched the language family tree as constructed by linguists. 1.2 How to determine (mutual) intelligibility? Although methods for determining intelligibility are well-established, for in- stance in the fields of speech technology and audiology, the practical problems

Upload: others

Post on 12-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

Linguistics in the Netherlands 2007, 223–234.issn 0929–7332 / e-issn 1569–9919 © Algemene Vereniging voor Taalwetenschap

Mutual intelligibility and similarity of Chinese dialectsPredicting judgments from objective measures

Chaoju Tang and Vincent J. van HeuvenChongqing Jiaotong University / University of Leiden/LUCL

1. Introduction

1.1 Why study mutual intelligibility?

Distance between languages is used as a criterion when arguing about genealogi-cal relationships between languages. The more languages resemble each other, the more likely they are derived from the same parent language, i.e., belong to the same language family. However, it is difficult to quantify the distance between languages one-dimensionally since languages differ along many structural dimensions (e.g. phonetics, phonology, morphology, syntax). It is unclear how the various dimen-sions should be weighed against each other. Therefore, we select a single criterion — mutual intelligibility. Mutual intelligibility is an overall criterion that may tell us in a psychologically relevant way whether two languages are similar/close.

Useful work on structural measures of difference between related languages has been done, for instance, at Stanford University (for Gaelic Irish dialects, cf. Kessler 1995) and at the University of Groningen (for Dutch and Norwegian dia-lects, Heeringa 2004), using the Levenshtein distance measure. This is a similarity metric that computes the mean number of string operations needed to convert the phonetic transcription of a word in one language (variety) to its counterpart in another language (variety). This measure was then used to build a tree structure (through hierarchical cluster analysis) which largely matched the language family tree as constructed by linguists.

1.2 How to determine (mutual) intelligibility?

Although methods for determining intelligibility are well-established, for in-stance in the fields of speech technology and audiology, the practical problems

Page 2: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

224 Chaoju Tang and Vincent J. van Heuven

are prohibitive when mutual intelligibility has to be established for, say, all pairs of varieties in a set of 15 dialects (yielding 225 pairs). Rather than measuring intel-ligibility by functional tests, opinion testing has been advanced as a shortcut. That is, the indices of the measurements of mutual intelligibility between languages are generated from listeners’ judgment scores. Once mutual intelligibility scores are available, the relative importance of structural dimensions can be found through multiple regression techniques. Such work has recently been done for 15 Norwe-gian dialects by Gooskens & Heeringa (2004) (henceforth G&H). Their results show that subjectively judged distance between sample dialects and the listen-er’s own dialect correlated substantially with the objective Levenshtein distance (r2 = 0.449).

Perceived distance between some dialect and one’s own is not necessarily the same as an intelligibility judgment. One of the aims of our paper is to test to what extent judged distance and judged intelligibility actually measure the same property.

G&H computed Levenshtein distances on the basis of the 58 words making up the fable “The North Wind and the Sun” read in the 15 Norwegian dialects. The Levenshtein distance will increase rapidly when the word pairs in two languages are non-cognates. For non-cognates any sound correspondence is accidental, so that the Levenshtein distance will be close to 100. It might therefore be more infor-mative to break the one-dimensional Levenshtein distance down into two separate parameters, i.e. (i) the percentage of cognate words shared between the vocabular-ies of two language varieties and (ii) the phonological distance computed for the cognate part of the vocabulary only. This is what we did in our study. We includ-ed two predictors of mutual intelligibility, one based on the number of cognates shared between a pair of dialects and the other on regularity in the phonological correspondences between the cognate words only (an alternative to the Leven-shtein distance). This will allow us to estimate the strengths of the two predictors as well as their intercorrelation.

The work done by G&H (2004) represents a complication relative to earlier work in that their Norwegian dialects are tone languages whilst the Gaelic Irish and Dutch dialects typically are not. Since it is unclear how tonal differences should be weighed in the distance measure, G&H collected distance judgments for the same reading passages resynthesized with and without pitch variations. The difference in judged distance between the pairs of versions (with and without pitch) would then be an estimate of the weight of the tonal information. Norwe-gian, however, is a language with a binary tone contrast. We want to test G&H’s method on full-fledged tone languages, with much richer tone inventories varying from four (e.g. Beijing/Northern Mandarin family) to as many as nine (e.g. Can-tonese, also known as Guangzhou/Yue family).

Page 3: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

Mutual intelligibility and similarity of Chinese dialects 225

1.3 Earlier work

Chinese dialect classification is still controversial. Nevertheless, there is broad consensus on the primary relationships within the Sinitic languages: there is a first split between the Mandarin group (comprising the Northern, Eastern and South-western families) and the Southern group (comprising the Wu, Gan, Xiang, Min, Hakka and Yue families). Cheng (1997) has computed structural similarity mea-sures for all pairs of these Chinese dialects from a large, multi-dialectal, lexical/phonological database. We have used two of his measures (see § 2.2) as predictors of mutual intelligibility between pairs of Chinese dialects in the present study.

2. Methods

2.1 Collecting judgments

We targeted 15 Chinese dialects (a subset from Cheng 1997), from the Mandarin group: Beijing, Chengdu, Jinan, Xi’an, Taiyuan, Hankou; from the Southern group: Suzhou, Wenzhou (Wu family), Nanchang (Gan family), Meixian (Hakka family), Xiamen, Fuzhou, Chaozhou (Min family), Changsha (Xiang family), and Guang-zhou a.k.a. Cantonese (Yue family). A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1.

We used existing recordings of the fable “The North Wind and the Sun”. Since each fable had been read by a different speaker (11 males and 4 females), we pro-cessed the recordings (using the Praat speech processing software, cf. Boersma & Weenink 1996) such that all speakers sounded like males, all had roughly the same articulation rate and speech-pause ratio, and the same mean pitch.1 We established, in a separate (unpublished) experiment, that possible differences in sound and recording quality were not responsible for any effects reported in this article. Also, each reading of the fable was produced in two melodic versions, i.e., one with the original pitch intervals kept intact, and one with all pitch movements replaced by a constant pitch (monotone), which was the same as the mean pitch of the fragment with melody (and the same as all other fragments).

The 2 × 15 readings of the fable were recorded onto audio CDs in one of four different random orders (A, B, C, D, where C and D were the reversed orders of A and B). On each audio CD, the 15 monotonized versions preceded the 15 versions with melody.

For each of the 15 dialects 24 native listeners were found in the middle to older generation (ages between 40 and 60), evenly divided between males and females. All 360 listeners were born and bred in their respective dialect areas. Listeners

Page 4: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

226 Chaoju Tang and Vincent J. van Heuven

were mono-dialectal so that they had no experience with any other Chinese dia-lects (though all had some familiarity with Standard Mandarin).

Each of the four CDs was played through loudspeakers to a different group of six listeners (three females, three males) per dialect. Listeners rated the materials twice: the first time they estimated on a scale from 0 to 10 how well they believed monolingual listeners of their own dialect, confronted with a speaker of the dialect in the recording for the first time in their life, would understand the other speaker. Here ‘0’ stood for ‘They will not understand a word of the other speaker’ whilst ‘10’ represented ‘They will understand the other speaker perfectly’. In the second judgment the listeners rated the similarity between their own dialect and the dia-lect of the speaker in the recording, where ‘0’ meant ‘No similarity at all’ against ‘10’ meaning ‘This dialect is exactly the same as my own’. In all 21,600 judgments were collected.

Figure 1. Language distribution in the P.R. China (downloaded from http://www.china-travel.com/china-travel-guides/china-maps, China thematic map nr. 10). The 15 target dialects (a–o) of our study are identified on the map and listed hierarchically according to dialect (sub)group.

Page 5: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

Mutual intelligibility and similarity of Chinese dialects 227

2.2 Structural measures

We used two objective measures of structural similarity between pairs of Chinese dialects. Both measures were computed for Chinese dialects by Cheng (1997). For reasons of space we will only conceptually indicate how they can be interpreted and refer for details to Cheng (1997: 50–52; 52–54 for LSI and PCI, respectively).

The first measure, which we call the Lexical Similarity Index (LSI), can be conceived of as the percentage of cognates shared between the vocabularies of two language varieties. Obviously, the higher the number (and token frequencies) of cognate words a listener encounters in a non-native dialect, the easier it will be for him to understand the message. We simply copied the values published in Cheng (1997, Appendix 3). Unfortunately, Cheng (1997) does not list LSI values for Taiyuan and Hankou. This is the reason why LSI results are presented for only 13 dialects in § 3.

Cheng’s second measure basically captures the regularity of the sound and tone correspondences in the sets of cognate words shared between two dialects. Cognates between two dialects will be easier to recognize if they contain the same sounds in the same positions in the words, or if the sounds (and tones) can be converted from one dialect to the other by a simple and general rule. Cheng com-puted a coefficient ranging between 0 (no phonological correspondence at all) to 1 (perfect sound correspondence) by determining the number of rules needed and weighing each rule for its degree of generality. We call this measure the Phonologi-cal Correspondence Index (PCI). We copied the PCI values from Cheng (1997, Appendix 5).

3. Results

3.1 Objective and subjective measures

We generated 15 × 15 matrices for each of the six measures for the 15 target dia-lects: (a) objective lexical similarity (LSI, only 13 dialects), (b) objective phonologi-cal correspondence (PCI), (c–d) subjective intelligibility and similarity judgments for stimulus versions with melody, and (e–f) subjective intelligibility and similarity judgments for versions without melody. From the matrices (a, b, c and d are pre-sented in the appendix; e and f were omitted,) hierarchical cluster trees were de-rived using the same average linking method that Cheng (1997) used (see below).

In Figure 2a–d we show four cluster trees. Two more trees, based on judg-ments on monotonized stimuli (e–f), have been omitted for reasons of space, as these revealed poorer structure than those based on the full-melody versions.

Page 6: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

228 Chaoju Tang and Vincent J. van Heuven

a. Objective LSI b. Objective PCI

c. Judged intelligibility d. Judged similarity

Figure 2. Hierarchical cluster trees for 15 Chinese dialects, based on (a) objective lexical similarity (b) objective phonological correspondence, (c) subjective intelligibility ratings (with melody), and (d) subjective similarity ratings (with melody).

Page 7: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

Mutual intelligibility and similarity of Chinese dialects 229

Inspection of the trees shows a rather poor congruence between any pair of trees. Even the primary split between Mandarin and Southern dialects is not cor-rectly reproduced in the trees. Typically, the arguably Southern dialects Changsha and/or Nanchang are incorrectly parsed with the Mandarin dialects. Generally, the degree of congruence is better between the two subjective ratings than be-tween the objective measures. We will first examine the relationship between the two subjective measures, and then determine how well these subjective ratings can be predicted by some combination of objective similarity measures.

3.2 Predicting intelligibility from similarity

We used the proximity between the members of every single pair (N = 105) of dialects out of the set of 15 as our measure of closeness between the members. Symmetrical proximity matrices were generated; the redundant part of the ma-trices was deleted before we correlated the proximity values obtained from either the intelligibility or similarity ratings. The result shows that judged intelligibility correlates with judged similarity (N = 105 pairs of values) at r = .949 (p < .001). This means that the two sets of ratings can be predicted from each other with a very high degree of accuracy. Moreover, visual inspection of the corresponding scatter-plot (see Figure 3) reveals no specific outliers, so that the conclusion follows that subjectively estimated similarity between pairs of languages is an exceptionally good predictor of, or even a near-perfect substitute for, estimated intelligibility.

0 100 200 300 400 500

Proximity based on judged distance

0

200

400

600

800

1000

Prox

imity

bas

ed o

n ju

dged

inte

lligi

bilit

y

R Sq Linear = 0.9

Figure 3. Relationship between judged similarity between dialects and their subjectively estimated mutual intelligibility.

Page 8: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

230 Chaoju Tang and Vincent J. van Heuven

3.3 Predicting subjective from objective measures

In Table 1 we have specified how well judged intelligibility and judged similarity can be predicted from the objectively determined LSI and PCI measures. We also computed correlation coefficients between objective and log-transformed subjec-tive measures; these generally yield higher r-values. A separate series of computa-tions was done on the scores after excluding Beijing as one of the dialects. This was done to check to what extent shared knowledge of Standard Chinese, which strongly resembles Beijing dialect, affects the results. Moreover, all the computa-tions were done once with the judgments based on the sound stimuli with full me-lodic information and a second time with judgments based on the monotonized versions. Finally, we list the results of selected (stepwise) multiple regression anal-yses (with both LSI and PCI entered in the analysis for selected crucial combina-tions of conditions) in order to determine the cumulative effect of the predictors.

Table 1. Correlation coefficients (r) and number of dialect pairs involved (N) between two measures of objective structural similarity and subjective intelligibility and similarity ratings. Multiple R is indicated for optimal conditions only (see text).

Variables and conditions Cheng’s PCI Cheng’s LSI Bothr N r N R

Cheng’s LSI .761* 78Judged intelligibility, melody .466* 78 .420* 78Judged intelligibility, monotone .418* 78 .375* 78Judged similarity, melody .590* 78 .558* 78Judged similarity, monotone .497* 78 .481* 78Log judged intelligibility, melody .604* 78 .589* 78 .636*Log judged intelligibility, monotone .548* 78 .534* 78Log judged similarity, melody .699* 78 .694* 78 .742*Log judged similarity, monotone .631* 78 .626* 78Judged intelligibility, melody, no Beijing .553* 66 .573* 66Judged intelligibility, monotone, no Beijing .552* 66 .629* 66Judged similarity, melody, no Beijing .656* 66 .701* 66Judged similarity, monotone, no Beijing .665* 66 .713* 66Log judged intelligibility, melody, no Beijing .675* 66 .710* 66 .753*Log judged intelligibility, monotone, no Beijing .613* 66 .665* 66Log judged similarity, melody, no Beijing .711* 66 .753* 66 .797*Log judged similarity, monotone, no Beijing .665* 66 .713* 66

*: p < .01 (two-tailed). In all cases of Multiple Regression each predictor made a significant independent contribution to R2.

Page 9: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

Mutual intelligibility and similarity of Chinese dialects 231

4. Conclusions

A number of conclusions can be drawn from Table 1. First, the two objective mea-sures of structural similarity, PCI and LSI, are always significantly correlated with all of the subjective ratings. Moreover, the two predictors are only moderately in-tercorrelated so that there is potential room for improvement of the prediction through multiple regression. The success of multiple regression is demonstrated most clearly in the prediction of log-transformed similarity for versions with mel-ody and Beijing dialect excluded: here the accuracy of the prediction (coefficient of determination, i.e. r2 or R2) from both objective measures together (64%) is 7 per-centage points better than from the best single predictor (57%). It is even 19 percent than the single r2 in Gooskens & Heeringa (2004) (see § 1.2). The latter result shows that better prediction of judged similarity and intelligibility can be obtained when a one-dimensional objective phonological distance measure is broken down into two separate parameters, one covering the proportion of cognates shared between two vocabularies and the other targeting the phonological similarity in the shared cognates only — as was assumed all along by Cheng (1997). Our result, however, contradicts a recent study by Gooskens & Heeringa (2006), who attempted mul-tiple regression of lexical and pronunciation distance against their judgment data on Norwegian dialects, but did not find a significant improvement of R2.

Second, similarity judgments can be predicted more successfully (higher r-values) than the corresponding mutual intelligibility judgments. In the Chinese language situation, almost every language user has had some basic exposure to the standard through (primary) education and media exposure. The standard language is almost identical to the Beijing dialect. As a consequence of this, our linguistically naïve listeners truthfully stated that they could understand some of the Beijing version of the fable, but were yet able to appreciate the large struc-tural difference between the stimulus version and their own dialect. To the extent that this has happened, the intelligibility and similarity judgments do not provide parallel information. This explains why leaving out the Beijing dialect in the com-putations of r and R yielded better predictions of judged similarity and of mutual intelligibility.

Third, the prediction of log-transformed judgments is better than of the corre-sponding linear measures. This effect has been found in many other studies on the relationship between objective counts on language use and the subjective impres-sion of such phenomena, e.g. in the area of word token frequency. The effect was also reported by Gooskens and Heeringa (2004) for the relationship between the objective Levenshtein distance and judged similarity among Norwegian dialects.

Fourth, the ratings based on versions with full melodic information can be predicted substantially better from the objective measures than those based on

Page 10: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

232 Chaoju Tang and Vincent J. van Heuven

monotonized versions. This indicates that the contribution of melodic informa-tion should be weighed rather heavily in the development of realistic measures of linguistic distance and of predictors of mutual intelligibility among Chinese dialects.

Note

1. The mean pitch was normalized to the mean of the 11 male speakers. Relatively small shifts in pitch (in semitones) were performed (using the PSOLA pitch manipulation implemented in the Praat software) on the male speakers, larger shifts were required for the female voices. For the female speakers a gender transformation was carried out by decreasing the formants by 15%. Longer pauses were reduced to 500-ms length, and the remaining speech was linearly speeded up or slowed down (in the same PSOLA manipulation that changed the pitch) such that the articulation rate (syllables/s) was the same for all speakers.

References

Beijing University. 1962. Chinese Dialect Character Pronunciation List. Beijing: Wenzi Gaige Chubanshe.

Bezooijen, Rénée van & Vincent van Heuven. 1997. “Assessment of speech synthesis”. Handbook of standards and resources for spoken language systems ed. by Daffyd Gibbon, Roger Moore & Richard Winksi, 481–653. Berlin/New York: Mouton de Gruyter.

Boersma, Paul & David Weenink. 1996. “Praat, a system for doing phonetics by computer, ver-sion 3.4”. Report 132, Institute of Phonetic Sciences, University of Amsterdam.

Cheng, Chin-Chuan. 1997. “Measuring Relationship among Dialects: DOC and Related Re-sources”. Computational Linguistics & Chinese Language Processing 2.1.41–72

Diekhoff, George M. 1992. Statistics for the Social and Behavioral Sciences. Dubuque, IA: Brown.

Duanmu, San. 2000. The Phonology of Standard Chinese. Oxford: Oxford University Press.Gooskens, Charlotte & Wilbert Heeringa. 2004. “Perceptive evaluation of Levenshtein dialect

distance measurements using Norwegian dialect data”. Language Variation and Change 16.189–207.

Gooskens, Charlotte & Wilbert Heeringa. 2006. The relative contribution of pronunciation, lexi-cal and prosodic differences between Norwegian dialects. Literary and Linguistic Comput-ing 21. 477–492.

Heeringa, Wilbert. 2004. Measuring dialect pronunciation differences using Levenshtein dis-tance. Doctoral diss. Rijksuniversiteit Groningen.

Kessler, Brett, 1995. “Computational dialectology in Irish Gaelic”. Proceedings of the Association for Computational Linguistics, European Chapter, Dublin: ACL. 60–67.

Page 11: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

Mutual intelligibility and similarity of Chinese dialects 233

Appendix

Table A1. Lexical Similarity Indices for all 169 pairs of 13 Chinese dialects.Su

zhou

Wen

zhou

Gua

ngzh

ou

Xia

men

Fuzh

ou

Cha

ozho

u

Mei

xian

Nan

chan

g

Cha

ngsh

a

Taiy

uan

Beiji

ng

Jinan

Han

kou

Che

ngdu

Xi’a

n

Suzhou 1 .312 .184 .079 .123 .097 .182 .375 .345 .298 .309 .295 .316Wenzhou .312 1 .194 .102 .141 .101 .189 .281 .261 .217 .231 .211 .221Guangzhou .184 .194 1 .170 .164 .211 .302 .245 .227 .548 .221 .171 .209Xiamen .079 .102 .170 1 .280 .338 .165 .133 .119 .198 .164 .089 .133Fuzhou .123 .141 .164 .280 1 .245 .141 .184 .160 .269 .218 .139 .201Chaozhou .097 .101 .211 .338 .245 1 .185 .149 .135 .213 .173 .098 .139Meixian .182 .189 .302 .165 .141 .185 1 .272 .226 .214 .212 .165 .201Nanchang .375 .281 .245 .133 .184 .149 .272 1 .555 .442 .454 .423 .447Changsha .345 .261 .227 .119 .160 .135 .226 .555 1 .461 .487 .485 .483TaiyuanBeijing .289 .217 .240 .198 .269 .213 .214 .442 .461 1 .671 .447 .610Jinan .309 .231 .221 .164 .218 .173 .212 .454 .487 .671 1 .453 .607HankouChengdu .295 .211 .171 .089 .139 .098 .165 .423 .485 .447 .453 1 .487Xi’an .316 .221 .209 .133 .201 .139 .201 .447 .483 .610 .607 .487 1

Table A2. Phonological Correspondence Indices (PCI) for all 225 pairs of 15 Chinese dialects.

Suzh

ou

Wen

zhou

Gua

ngzh

ou

Xia

men

Fuzh

ou

Cha

ozho

u

Mei

xian

Nan

chan

g

Cha

ngsh

a

Taiy

uan

Beiji

ng

Jinan

Han

kou

Che

ngdu

Xi’a

n

Suzhou 1 .492 .483 .525 .511 .499 .572 .561 .517 .568 .510 .523 .587 .592 .546Wenzhou .534 1 .473 .455 .499 .489 .485 .468 .514 .485 .407 .452 .467 .483 .464Guangzhou .484 .469 1 .515 .503 .474 .567 .522 .454 .455 .487 .480 .477 .458 .479Xiamen .461 .341 .434 1 .498 .510 .511 .489 .388 .468 .457 .421 .486 .449 .453Fuzhou .457 .405 .435 534 1 .555 .557 .538 .443 .544 .490 .473 .483 .505 .487Chaozhou .439 .402 .396 .498 .545 1 .491 .477 .412 .517 .413 .417 .444 .463 .465Meixian .480 .418 .528 .535 .540 .504 1 .658 .516 .535 .504 .438 .549 565 465Nanchang .519 .376 .469 .537 .547 .514 .655 1 .524 .568 .577 .499 .583 .614 536Changsha .534 .439 .412 .448 .492 .479 .532 .563 1 .520 .610 .572 .689 .688 608Taiyuan .549 .400 .437 .476 .539 .516 .557 .560 .529 1 .609 .603 .590 .633 .612Beijing .489 .382 .464 .503 .536 .473 .553 .587 .608 .608 1 .713 .728 .730 .656Jinan .500 .404 .429 .458 .451 .414 .492 .497 .541 .612 .725 1 .594 .646 .765Hankou .512 .378 .463 .529 .481 .492 .576 .622 .663 .574 .727 .582 1 .799 .627Chengdu .498 .399 .450 .506 .524 .536 .580 .622 .632 .599 .722 .669 .791 1 .697Xi’an .551 .418 .431 .490 .476 .465 .516 .530 .579 .617 .715 .771 .643 .690 1

Page 12: Mutual intelligibility and similarity of Chinese dialects...A map showing the geographic distribution and the genealogic relationships among the dialects is presented in Figure 1

234 Chaoju Tang and Vincent J. van Heuven

Table A3. Intelligibility ratings for stimuli with full melodic information spoken in 15 Chinese dialects as judged by groups of 24 listeners of the same 15 dialects.

Suzh

ou

Wen

zhou

Gua

ngzh

ou

Xia

men

Fuzh

ou

Cha

ozho

u

Mei

xian

Nan

chan

g

Cha

ngsh

a

Taiy

uan

Beiji

ng

Jinan

Han

kou

Che

ngdu

Xi’a

n

Suzhou 10.00 2.83 .38 1.25 1.21 .83 .46 1.25 .58 .50 1.63 1.92 .54 1.92 2.42Wenzhou 1.25 9.38 .54 .88 1.38 .46 .71 1.54 2.33 .83 2.50 1.38 .54 1.46 1.00Guangzhou .04 .87 10.00 1.42 1.25 1.54 4.58 1.71 1.33 .21 .58 .71 .71 .79 1.54Xiamen .00 .71 .96 9.08 1.46 6.04 .54 1.13 .96 .50 1.17 .13 .17 .17 .58Fuzhou .13 .96 .67 .50 9.83 1.21 .33 .63 .67 .29 .58 .46 .21 .71 1.29Chaozhou .00 .83 .63 4.79 1.83 9.71 .42 .67 .29 .63 .17 .96 .08 .67 1.17Meixian .17 1.08 .38 1.50 2.00 1.88 9.00 2.08 .88 1.08 1.08 1.04 .25 .79 .71Nanchang 5.08 2.71 .88 1.79 2.17 1.42 .88 9.92 4.04 5.04 1.96 4.58 1.46 3.13 2.71Changsha 7.00 3.88 .50 2.96 1.71 1.79 .88 6.50 10.00 6.71 5.08 6.96 4.54 6.42 5.54Taiyuan 6.21 2.75 .67 1.21 1.63 1.21 1.00 2.88 4.17 10.00 2.67 6.33 2.25 6.71 6.00Beijing 8.58 8.91 .58 7.38 2.42 2.71 5.29 9.46 9.54 9.83 9.08 9.83 7.75 9.29 9.54Jinan 7.58 5.38 .54 4.54 1.75 1.92 1.00 7.83 6.58 9.42 6.21 10.00 3.67 7.96 8.54Hankou 7.00 4.00 .42 2.33 2.00 1.25 .92 6.88 6.96 7.75 4.96 8.33 10.00 7.67 7.79Chengdu 6.54 5.25 .79 1.63 1.46 .83 1.13 6.33 7.04 7.38 4.88 8.00 4.58 10.00 6.33Xi’an 6.21 3.63 .63 1.92 1.25 .96 .71 4.88 4.46 8.96 4.79 8.00 2.42 7.25 9.58

Table A4. Similarity ratings for full-melody stimuli spoken in 15 Chinese dialects as judged by groups of 24 listeners of the same 15 dialects.

Suzh

ou

Wen

zhou

Gua

ngzh

ou

Xia

men

Fuzh

ou

Cha

ozho

u

Mei

xian

Nan

chan

g

Cha

ngsh

a

Taiy

uan

Beiji

ng

Jinan

Han

kou

Che

ngdu

Xi’a

n

Suzhou 9.96 2.54 .13 .63 .04 .58 .17 .38 .54 .42 1.50 .92 .04 1.13 1.42Wenzhou .92 9.17 .29 .38 .00 .04 .29 .63 1.08 .33 2.25 .67 .04 .92 .38Guangzhou .00 .41 10.00 .83 .17 .58 5.52 .67 1.04 .00 .63 .38 .08 .29 1.00Xiamen .00 .75 .50 8.96 .17 5.13 .17 .25 .50 .38 .92 .13 .04 .04 .13Fuzhou .04 1.00 .25 .30 10.00 .71 .25 .00 .35 .00 .50 .21 .00 .17 .79Chaozhou .00 .38 .33 3.96 .38 10.00 .17 .00 .13 .25 .25 .42 .00 .13 .46Meixian .08 .63 .08 1.29 .29 1.25 9.13 .75 .50 .38 .92 .58 .13 .33 .33Nanchang .79 2.83 .46 1.08 .00 .50 .58 9.92 2.82 .88 1.71 2.46 .13 2.00 1.46Changsha 1.25 .38 .25 1.38 .00 .25 .50 4.67 10.00 1.38 4.46 3.92 2.25 4.13 3.33Taiyuan .21 1.29 .38 .65 .00 .08 .58 1.29 2.17 10.00 2.29 3.92 .83 4.42 3.83Beijing .21 2.00 .29 5.38 .08 .29 1.71 5.00 7.83 4.08 8.96 8.04 4.58 6.38 7.71Jinan .29 .21 .29 2.50 .00 .25 .54 5.42 4.58 2.50 5.57 9.96 1.67 5.83 6.38Hankou .38 .42 .13 1.33 .00 .58 .54 3.58 5.04 1.58 4.43 5.92 10.00 5.96 5.38Chengdu .38 1.33 .33 .88 .04 .00 .58 3.58 4.13 1.42 4.52 4.83 2.38 10.00 4.04Xi’an .38 .33 .29 .92 .00 .25 .50 3.00 2.79 2.54 3.70 5.67 1.21 5.46 8.63