a dvances in automated language classification asjp consortium dik bakke r, lancaster
Post on 12-Jan-2016
21 Views
Preview:
DESCRIPTION
TRANSCRIPT
Advances inAutomatedLanguage
Classification
ASJP ConsortiumDik Bakker, Lancaster
ASJP: Automatic Reconstruction
2
Overview
Project: ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
3
Overview
Project: ASJP are:
Sören Wichmann (BRD; Netherlands)Viveka Velupillai (BRD)André Müller (BRD)
Robert Mailhammer (BRD)Hagen Jung (BRD)Eric Holman (US)Anthony Grant (UK)Dmitry Egorov (Russia)Pamela Brown (US)Cecil Brown (US)Dik Bakker (UK; Netherlands)
ASJP: Automatic Reconstruction
4
Overview
Project: ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
5
Overview
Project: ASJP (Automated Similarity Judgment Program)
Overall goal:Automatic reconstruction of language relationships
ASJP: Automatic Reconstruction
6
Overview
Project: ASJP (Automated Similarity Judgment Program)
Overall goal:Automatic reconstruction of language relationships
Basis:Distance matrix between individual languages on basis of linguistic features
ASJP: Automatic Reconstruction
7
Overview
Project: ASJP (Automated Similarity Judgment Program)
Overall goal:Automatic reconstruction of language relationships
Basis:Distance matrix between individual languages on basis of linguistic features
Method: Lexicostatistics: mass comparison of lexical items
ASJP: Automatic Reconstruction
8
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals (a.o):
ASJP: Automatic Reconstruction
9
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
ASJP: Automatic Reconstruction
10
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
ASJP: Automatic Reconstruction
11
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
ASJP: Automatic Reconstruction
12
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
ASJP: Automatic Reconstruction
13
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
ASJP: Automatic Reconstruction
14
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find the best/optimal dating method
ASJP: Automatic Reconstruction
15
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find the best/optimal dating method
- Detect borrowings
ASJP: Automatic Reconstruction
16
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find the best/optimal dating method
- Detect borrowings
ASJP: Automatic Reconstruction
17
Overview
1. The basic list of lexical items
ASJP: Automatic Reconstruction
18
Overview
1. The basic list of lexical items
2. Comparing languages
ASJP: Automatic Reconstruction
19
Overview
1. The basic list of lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
ASJP: Automatic Reconstruction
20
Overview
1. The basic list of lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
4. On Inheritance vs Borrowing
ASJP: Automatic Reconstruction
21
Overview
1. The basic list of lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
4. On Inheritance vs Borrowing
5. Conclusions
ASJP: Automatic Reconstruction
22
1. The basic list of lexical items
ASJP: Automatic Reconstruction
23
Lexical items
Word list: Swadesh 100 basic meanings
ASJP: Automatic Reconstruction
24
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
ASJP: Automatic Reconstruction
25
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon /
grammar
ASJP: Automatic Reconstruction
26
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon /
grammar
- Inherited rather than borrowed
ASJP: Automatic Reconstruction
27
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon /
grammar
- Inherited rather than borrowed
- Culturally independent
ASJP: Automatic Reconstruction
28
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon /
grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
ASJP: Automatic Reconstruction
29
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon /
grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
- Few synonyms
ASJP: Automatic Reconstruction
30
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts
71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon
93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water
95. full
16. woman
36. feather
56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
ASJP: Automatic Reconstruction
31
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts
71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon
93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water
95. full
16. woman
36. feather
56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
ASJP: Automatic Reconstruction
32
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts
71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon
93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water
95. full
16. woman
36. feather
56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
ASJP: Automatic Reconstruction
33
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts
71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon
93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water
95. full
16. woman
36. feather
56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
ASJP: Automatic Reconstruction
34
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts
71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman
36. feather
56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
ASJP: Automatic Reconstruction
35
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts
71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon
93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water
95. full
16. woman
36. feather
56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
ASJP: Automatic Reconstruction
36
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts
71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon
93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water
95. full
16. woman
36. feather
56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
ASJP: Automatic Reconstruction
37
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
ASJP: Automatic Reconstruction
38
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
Less work
ASJP: Automatic Reconstruction
39
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
Less work
Less missing data
ASJP: Automatic Reconstruction
40
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
Less work
Less missing data
Faster processing; combinatorial explosion:
40 : 100 ~ 3 * 107 : 2 * 1010
ASJP: Automatic Reconstruction
41
Lexical items: stability
Most stable items:
ASJP: Automatic Reconstruction
42
Lexical items: stability
Most stable items:
Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005)
E.g. Germanic, Romance, Slavic, …
ASJP: Automatic Reconstruction
43
Lexical items: stability
Most stable items:
Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005)
E.g. Germanic, Romance, Slavic, …
Formula: S = (E - U)/(100 - U)(weighted average % matches Eq vs Uneq)
ASJP: Automatic Reconstruction
44
Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
++ < Stability > --
ASJP: Automatic Reconstruction
45
I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breasts say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
ASJP: Automatic Reconstruction
46
I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breast say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
40Most
Stable
ASJP: Automatic Reconstruction
47
I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breast say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
Homophones
ASJP: Automatic Reconstruction
48
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
ASJP: Automatic Reconstruction
49
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
ASJP: Automatic Reconstruction
50
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
- simple programming language (Fortran; Pascal)
ASJP: Automatic Reconstruction
51
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
- simple programming language (Fortran; Pascal)
Recoding to simplified ASJPcode (only Ascii)
ASJP: Automatic Reconstruction
52
Lexical items: transcriptionASJPcode:
ASJP: Automatic Reconstruction
53
Lexical items: transcriptionASJPcode: 7 Vowels
ASJP: Automatic Reconstruction
54
Lexical items: transcriptionASJPcode: 7 Vowels
34 Consonants
ASJP: Automatic Reconstruction
55
Lexical items: transcriptionASJPcode: 7 Vowels
34 Consonants
Operators for: NasalizationLabializationPalatalizationAspirationGlottalization
ASJP: Automatic Reconstruction
56
Lexical items: transcriptionASJPcode: 7 Vowels
34 Consonants
Operators for: NasalizationLabializationPalatalizationAspirationGlottalization
(some) complex syllables simplified (VXC VC)
ASJP: Automatic Reconstruction
57
Abaza (Caucasian):
Meaning
PERSON
LEAF
SKIN
HORN
NOSE
TOOTH
ASJP: Automatic Reconstruction
58
Abaza (Caucasian):
Meaning IPA
PERSON ʕʷɨʧʼʲʷʕʷɨs
LEAF bɣʲɨ
SKIN ʧʷazʲ
HORN ʧʼʷɨʕʷa
NOSE pɨnʦʼa
TOOTH pɨʦ
ASJP: Automatic Reconstruction
59
Abaza (Caucasian):
Meaning IPA ASJPcode
PERSON ʕʷɨʧʼʲʷʕʷɨs Xw~3Cw"yXw~3s
LEAF bɣʲɨ bxy~3
SKIN ʧʷazʲ Cw~azy~
HORN ʧʼʷɨʕʷa Cw"~3Xw~a
NOSE pɨnʦʼa p3nc"a
TOOTH pɨʦ p3c
ASJP: Automatic Reconstruction
60
Lexical items
Collected to date:
- Over 2100 languages, dialects and proto
ASJP: Automatic Reconstruction
61
Lexical items
Collected to date:
- Over 2100 languages, dialects and proto
- Mean number of items/language: 36.2 (/40)
ASJP: Automatic Reconstruction
62
Lexical items
Distribution:
Americas: 27%
Eurasia: 23%
Australia/PNG: 18%
Austronesia: 15%
Africa: 14%
Creoles: 2%
Artificial: 1%
ASJP: Automatic Reconstruction
63
Languages currently sampled
ASJP: Automatic Reconstruction
64
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
ASJP: Automatic Reconstruction
65
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
1. automatic conversion IPA to integer (Python)
ASJP: Automatic Reconstruction
66
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar
ASJP: Automatic Reconstruction
67
Lexical items: transcription
Abaza (Caucasian):
Meaning: PERSON
ASJP: Automatic Reconstruction
68
Lexical items: transcription
Abaza (Caucasian):
Meaning: PERSON
IPA: ʕʷɨʧʼʲʷʕʷɨs
ASJP: Automatic Reconstruction
69
Lexical items: transcription
Abaza (Caucasian):
Meaning: PERSON
IPA: ʕʷɨʧʼʲʷʕʷɨs
Decimal: 661 695 616 679 700 690 695 661 695 616 115
ASJP: Automatic Reconstruction
70
Lexical items: transcription
Abaza (Caucasian):
Meaning: PERSON
IPA: ʕʷɨʧʼʲʷʕʷɨs
Decimal: 661 695 616 679 700 690 695 661 695 616 115
ASJPcode: 88 119 126 51 67 34 121 119 126 88 119 126 51 115
( = Xw~3Cw"y~Xw~3s)
ASJP: Automatic Reconstruction
71
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar
Why not run on full IPA??
ASJP: Automatic Reconstruction
72
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar
- correlations IPA ~ ASJP > 0.9
ASJP: Automatic Reconstruction
73
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar
- correlations IPA ~ ASJP > 0.9- but: ASJP better fit with classifications IPA too specific
ASJP: Automatic Reconstruction
74
Lexical items: transcription
IPA: ʕʷɨʧʼʲʷʕʷɨs
Decimal: 661 695 616 679 700 690 695 661 695 616 115
ASJP++code: ( = any unicode string )
A n661, n695, n616, ……P Q A B C…Z P Q Z
formal grammar
ASJP: Automatic Reconstruction
75
Lexical items: transcription
IPA: ʕʷɨʧʼʲʷʕʷɨs
Decimal: 661 695 616 679 700 690 695 661 695 616 115
ASJP++code: ( = any unicode string )
A n661, n695, n616, ……P Q A B C…Z P Q Z
optimal levelof abstractionfor historicalphonologicalreconstruction?
ASJP: Automatic Reconstruction
76
2. Comparing languages
ASJP: Automatic Reconstruction
77
Comparing words
LG I YOU WE
ABAZA sErE w3rE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
ASJP: Automatic Reconstruction
78
Comparing words
LG I YOU WE
ABAZA sErE bErE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
LDi=3
ASJP: Automatic Reconstruction
79
Comparing words
LG I YOU WE
ABAZA sErE bErE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
LDi=3 LDj=4
ASJP: Automatic Reconstruction
80
Comparing words
LG I YOU WE
ABAZA sErE bErE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
LDi=3 LDj=4 LDk=3
ASJP: Automatic Reconstruction
81
Comparing words
LG I YOU WE
ABAZA sErE bErE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
LDi=3 LDj=4 LDk=3
…
ASJP: Automatic Reconstruction
82
Comparing words
LG I YOU WE
ABAZA sErE bErE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
LDi=3 LDj=4 LDk=3LDmean=3.73
…
ASJP: Automatic Reconstruction
83
Comparing words
LG I YOU WE
ABAZA sErE bErE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
LDi=4 LDj=4 LDk=4LDmean=4.37
…
ASJP: Automatic Reconstruction
84
Comparing words
LG I YOU WE
ABAZA sErE w3rE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
3.73
ASJP: Automatic Reconstruction
85
Comparing words
LG I YOU WE
ABAZA sErE w3rE Sw~ErE
ABKHAZ s3 w3 Sw~3
AGUL zun wun cw~un
3.73
4.37
ASJP: Automatic Reconstruction
86
Comparing words
Levenshtein Distance
ASJP: Automatic Reconstruction
87
Comparing words
Levenshtein Distance
a. between 2 words:
Number of transformations to get from the shorter form to the longer one (changes, additions)
ASJP: Automatic Reconstruction
88
Comparing words
Levenshtein Distance
a. between 2 words:
Number of transformations to get from the shorter form to the longer one (changes, additions)
b. Between 2 languages:
E.g. mean LD for overlapping set (<= 40)
ASJP: Automatic Reconstruction
89
Comparing words
Levenshtein Distance
Two problems with simple LD:
ASJP: Automatic Reconstruction
90
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
ASJP: Automatic Reconstruction
91
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
Normalize: LDN = ( LD / Lmax )
ASJP: Automatic Reconstruction
92
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
Normalize: LDN = ( LD / Lmax )
2. Differences between lgs in phonological overlap
ASJP: Automatic Reconstruction
93
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
Normalize: LDN = ( LD / Lmax )
2. Differences between lgs in phonological overlap
Eliminate ‘noise’: LDND = ( LDN / LDNdifferent
)
ASJP: Automatic Reconstruction
94
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
Normalize: LDN = 100 * LDN
2. Differences between lgs in phonological overlap
Eliminate ‘noise’: LDND = 100 * LDND
ASJP: Automatic Reconstruction
95
Comparing languages
Levenshtein Distance for Language Pair
- Mean of all LDND’s of words in common
ASJP: Automatic Reconstruction
96
Comparing languages
Levenshtein Distance for Language Pair
- Mean of all LDND’s of words in common
- Synonyms (12%):- take Minimum pair- take Mean
ASJP: Automatic Reconstruction
97
Comparing languages
Levenshtein Distance for Language Pair
- Mean of all LDND’s of words in common
- Synonyms (12%):- take Minimum pair- take Mean
Experimentaloption
ASJP: Automatic Reconstruction
98
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
99
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
100
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
101
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
102
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
103
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
104
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
105
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
106
Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
107
Comparing languagesLANG1 LANG2 FAM1 FAM2 LDNDFRENCH ARPITAN INDO-EUROPEAN INDO-EUROPEAN 55.63FRENCH GALICIAN INDO-EUROPEAN INDO-EUROPEAN 74.49FRENCH ARAGONESE INDO-EUROPEAN INDO-EUROPEAN 76.16FRENCH FRIULIAN INDO-EUROPEAN INDO-EUROPEAN 74.64FRENCH ROMANSH_SURSILVAN INDO-EUROPEAN INDO-EUROPEAN 77.80FRENCH ROMANIAN INDO-EUROPEAN INDO-EUROPEAN 74.37FRENCH LATIN INDO-EUROPEAN INDO-EUROPEAN 80.07FRENCH CATALAN INDO-EUROPEAN INDO-EUROPEAN 71.69FRENCH ITALIAN INDO-EUROPEAN INDO-EUROPEAN 75.91FRENCH PORTUGUESE INDO-EUROPEAN INDO-EUROPEAN 74.38FRENCH SPANISH INDO-EUROPEAN INDO-EUROPEAN 80.91FRENCH DANISH INDO-EUROPEAN INDO-EUROPEAN 93.11FRENCH BERNESE_GERMAN INDO-EUROPEAN INDO-EUROPEAN 93.18FRENCH CIMBRIAN INDO-EUROPEAN INDO-EUROPEAN 94.43FRENCH BRABANTIC INDO-EUROPEAN INDO-EUROPEAN 95.18FRENCH NORTH_FRISIAN_AMRUM INDO-EUROPEAN INDO-EUROPEAN 95.30FRENCH JAMTLANDIC INDO-EUROPEAN INDO-EUROPEAN 94.58FRENCH LIMBURGISH INDO-EUROPEAN INDO-EUROPEAN 94.78FRENCH OLD_HIGH_GERMAN INDO-EUROPEAN INDO-EUROPEAN 92.70FRENCH PLAUTDIETSCH INDO-EUROPEAN INDO-EUROPEAN 95.35FRENCH NORTHERN_LOW_SAXON INDO-EUROPEAN INDO-EUROPEAN 90.87FRENCH STELLINGWERFS INDO-EUROPEAN INDO-EUROPEAN 92.85FRENCH FRANS_VLAAMS INDO-EUROPEAN INDO-EUROPEAN 94.08
ASJP: Automatic Reconstruction
108
3. Some results: genetic and areal proximity
ASJP: Automatic Reconstruction
109
Distance Matrix (0.5 * N * (N-1))
FRE DUT GAL PRT ENG …
FRE
DUT 90.93
GAL 71.62 90.00
PRT 74.38 94.61 51.87
ENG 91.17
63.19 91.30 95.18
…< Excel file >
ASJP: Automatic Reconstruction
110
Tools for Trees
ASJP: Automatic Reconstruction
111
Tools for Trees Input file to your preferred phylogenetic
software using an editor such as TextPad (www.textpad.com)
ASJP: Automatic Reconstruction
112
Tools for Trees Input file to your preferred phylogenetic
software using an editor such as TextPad (www.textpad.com)
Run data using phylogenetic software such as SplitsTree (www.splitstree.org)
ASJP: Automatic Reconstruction
113
Tools for Trees Input file to your preferred phylogenetic
software using an editor such as TextPad (www.textpad.com)
Run data using phylogenetic software such as SplitsTree (www.splitstree.org)
Choose the most appropriate algorithm (Neighbour Joining for distance data)
ASJP: Automatic Reconstruction
114
Tools for Trees Input file to your preferred phylogenetic
software using an editor such as TextPad (www.textpad.com)
Run data using phylogenetic software such as SplitsTree (www.splitstree.org)
Choose the most appropriate algorithm (Neighbour Joining for distance data)
Prepare tree for presentation using using a tool such as the Tree Explorer of MEGA
ASJP: Automatic Reconstruction
115
SalishanLanguages
(n=30)
ASJP: Automatic Reconstruction
116
NeighborJoining
SalishanLanguages
(n=30)
ASJP: Automatic Reconstruction
117
UPGMA NeighborJoining
ASJP: Automatic Reconstruction
118
UPGMA NeighborJoining
ASJP: Automatic Reconstruction
119
NeighborJoining
NeighborJoining:
ASJP: Automatic Reconstruction
120
NeighborJoining
NeighborJoining:
- specifically meant for phylogenetic trees
ASJP: Automatic Reconstruction
121
NeighborJoining
NeighborJoining:
- specifically meant for phylogenetic trees
- takes distance as point of departure
ASJP: Automatic Reconstruction
122
NeighborJoining
NeighborJoining:
- specifically meant for phylogenetic trees
- takes distance as point of departure
- does NOT assume equal rate of change
ASJP: Automatic Reconstruction
123Mayan (n=38)
ASJP: Automatic Reconstruction
124
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
ASJP: Automatic Reconstruction
125
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS; mainly the well-documented areas)
ASJP: Automatic Reconstruction
126
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS; mainly the well-documented areas)
- expert knowledge of specific areas
ASJP: Automatic Reconstruction
127
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS; mainly the well-documented areas)
- expert knowledge of specific areas
diversion ±12% niche!
ASJP: Automatic Reconstruction
128
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
ASJP: Automatic Reconstruction
129
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
ASJP: Automatic Reconstruction
130
Linguistically crucial events
c. 250Goths conquer Dacia split of E-W Romance
4th cIrish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W Roman Empire
breakup of W Romance
5th cGermans invade Britain split of English-Frisian
5th-6th c
Britons flee to Brittany split of Welsh-Breton
400-600Hieroglyphic evidence Ch'olan begins to split
768-814
Name of Charlemagne attested Proto-Slavic
Date Historical event Linguistic event
ASJP: Automatic Reconstruction
131
Linguistically crucial events
c. 250Goths conquer Dacia split of E-W Romance
4th cIrish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W Roman Empire
breakup of W Romance
5th cGermans invade Britain split of English-Frisian
5th-6th c
Britons flee to Brittany split of Welsh-Breton
400-600Hieroglyphic evidence Ch'olan begins to split
768-814
Name of Charlemagne attested Proto-Slavic
Date Historical event Linguistic event
ASJP: Automatic Reconstruction
132
Linguistically crucial events
c. 250Goths conquer Dacia split of E-W Romance
4th cIrish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W Roman Empire
breakup of W Romance
5th cGermans invade Britain split of English-Frisian
5th-6th c
Britons flee to Brittany split of Welsh-Breton
400-600Hieroglyphic evidence Ch'olan begins to split
768-814
Name of Charlemagne attested Proto-Slavic
Date Historical event Linguistic event
ASJP: Automatic Reconstruction
133
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
Standard formula (Swadesh):
TimeDepth = log(Similarity) / 2 log Retention
ASJP: Automatic Reconstruction
134
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
Standard formula:
TimeDepth = log(Similarity) / 2 log Retention
ASJP: Automatic Reconstruction
135
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
Standard formula:
TimeDepth = log(LDND) / 2 log Retention
ASJP: Automatic Reconstruction
136
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
Standard formula:
TimeDepth = log(LDND) / 2 log Retention
ASJP: Automatic Reconstruction
137
Linguistically crucial events
Time linguistic event
LDND
Ret
1.75 split of E-W Romance 0.6753 0.73
1.65split of Irish-Scottish Gaelic 0.6687 0.72
1.55 breakup of W Romance 0.6411 0.72
1.55 split of English-Frisian 0.6574 0.71
1.50 split of Welsh-Breton 0.5705 0.75
1.40 Ch'olan begins to split 0.5369 0.76
1.21 Proto-Slavic 0.5877 0.69
MEAN: 0.73
ASJP: Automatic Reconstruction
138
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73
ASJP: Automatic Reconstruction
139
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73 < 75%
ASJP: Automatic Reconstruction
140
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73 < 75%
Deeper!
ASJP: Automatic Reconstruction
141
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
ASJP: Automatic Reconstruction
142
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
ASJP: Automatic Reconstruction
143
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
WALS Typological database
ASJP: Automatic Reconstruction
144
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
WALS Typological database
Best result:
(75% 40 lex) + (25% 40 Ph/M/S features)
ASJP: Automatic Reconstruction
145
4. On Inheritance vs Borrowing
ASJP: Automatic Reconstruction
146
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
ASJP: Automatic Reconstruction
147
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0
ASJP: Automatic Reconstruction
148
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0
6 items < 70.0
ASJP: Automatic Reconstruction
149
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0
6 items < 70.0 Genetically related !!
ASJP: Automatic Reconstruction
150
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ASJP: Automatic Reconstruction
151
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2
ASJP: Automatic Reconstruction
152
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2
6 items < 70.0
ASJP: Automatic Reconstruction
153
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2
6 items < 70.0: RELATED ???
ASJP: Automatic Reconstruction
154
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2
RELATED ??? NO!!!
ASJP: Automatic Reconstruction
155
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2
INDO-EUROPEAN < > AUSTRONESIAN
ASJP: Automatic Reconstruction
156
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2
CHANCE?
ASJP: Automatic Reconstruction
157
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2
CHANCE? ~ 5% (i.e. 1 – 2 items)
ASJP: Automatic Reconstruction
158
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2
BORROWING through LANGUAGE CONTACT
ASJP: Automatic Reconstruction
159
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE : uno=unu * LDND=36.9
ASJP: Automatic Reconstruction
160
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE : uno=unu * LDND=36.9
SPA <> CHA:
ASJP: Automatic Reconstruction
161
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE : uno=unu * LDND=36.9
SPA <> CHA: fam/gen= 0.24/0.82
ASJP: Automatic Reconstruction
162
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE : uno=unu * LDND=36.9
SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00
ASJP: Automatic Reconstruction
163
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE : uno=unu * LDND=36.9
SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00
phon pattern fit= 12.00 > 0.67
ASJP: Automatic Reconstruction
164
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE : uno=unu * LDND=36.9
SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00
phon pattern fit= 12.00 > 0.67
…
ASJP: Automatic Reconstruction
165
Borrowed!
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE : uno=unu * LDND=36.9
SPA > CHA: fam/gen= 0.24/0.82 > 0.03/0.00 phon pattern fit= 12.00 > 0.67
…
ASJP: Automatic Reconstruction
166
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
TWO : dos=dos * LDND= 0.0
SPA > CHA f/g= 0.62/1.00 > 0.12/0.00
swF= 100.00 > 0.22
ASJP: Automatic Reconstruction
167
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona * LDND=55.3
SPA > CHA f/g= 0.20/0.64 > 0.01/0.00
swF= 32.40 > 0.13
ASJP: Automatic Reconstruction
168
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona * LDND=55.3
SPA > CHA f/g= 0.20/0.64 > 0.01/0.00
swF= 32.40 > 0.13
ALT: CHA= taotao (0.41/0.00)
ASJP: Automatic Reconstruction
169
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona * LDND=55.3
SPA > CHA f/g= 0.20/0.64 > 0.01/0.00
swF= 32.40 > 0.13
ALT: CHA= taotao (0.41/0.00)
ASJP: Automatic Reconstruction
170
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR : estreya=estrecas * LDND=61.2
SPA > CHA f/g= 0.17/0.82 > 0.00/0.00
swF= 100.00 > 4.44
ALT: CHA= puti7on (0.03/0.00)
ASJP: Automatic Reconstruction
171
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
NIGHT : noCe=noces * LDND=68.2
SPA > CHA f/g= 0.23/0.55 > 0.04/0.00
swF= 100.00 > 0.10
ALT: CHA= pw~eNi (0.23/0.00)
ASJP: Automatic Reconstruction
172
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
NEW : nuevo=nueba * LDND=44.2
SPA > CHA f/g= 0.50/0.64 > 0.04/0.00
swF= 4.27 > 0.03
ASJP: Automatic Reconstruction
173
5. Conclusions
ASJP: Automatic Reconstruction
174
Conclusions
- Method for automatic reconstruction of language relationships
ASJP: Automatic Reconstruction
175
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
ASJP: Automatic Reconstruction
176
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
ASJP: Automatic Reconstruction
177
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate potential borrowings
ASJP: Automatic Reconstruction
178
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate potential borrowings
- C O R E: incremental lexical database (> 35%)
ASJP: Automatic Reconstruction
179
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate potential borrowings
- C O R E: incremental lexical database (> 35%)
One day: Online
ASJP: Automatic Reconstruction
180
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate potential borrowings
- C O R E: incremental lexical database (> 35%)
One day: Online
Cooperation!!
ASJP: Automatic Reconstruction
181
Holman et al. (forthc. 2008) Explorations in automated language classification. Folia Linguistica
Brown et al. (forthc. 2008) Automated Classification of the World’s languages: A description of the method and prelimary results Sprachtypologie und Universalienforschung
+ Several working papers
email.eva.mpg.de./~wichmann/ASJPHomePage
ASJP: Automatic Reconstruction
182
?
top related