a dvances in automated language classification asjp consortium dik bakke r, lancaster

Post on 12-Jan-2016

21 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster. Overview. Project: ASJP ( A utomated S imilarity J udgment P rogram). Overview. Project: ASJP are: S ö ren Wichmann (BRD; Netherlands) Viveka Velupillai (BRD) Andr é Müller (BRD) - PowerPoint PPT Presentation

TRANSCRIPT

Advances inAutomatedLanguage

Classification

ASJP ConsortiumDik Bakker, Lancaster

ASJP: Automatic Reconstruction

2

Overview

Project: ASJP (Automated Similarity Judgment Program)

ASJP: Automatic Reconstruction

3

Overview

Project: ASJP are:

Sören Wichmann (BRD; Netherlands)Viveka Velupillai (BRD)André Müller (BRD)

Robert Mailhammer (BRD)Hagen Jung (BRD)Eric Holman (US)Anthony Grant (UK)Dmitry Egorov (Russia)Pamela Brown (US)Cecil Brown (US)Dik Bakker (UK; Netherlands)

ASJP: Automatic Reconstruction

4

Overview

Project: ASJP (Automated Similarity Judgment Program)

ASJP: Automatic Reconstruction

5

Overview

Project: ASJP (Automated Similarity Judgment Program)

Overall goal:Automatic reconstruction of language relationships

ASJP: Automatic Reconstruction

6

Overview

Project: ASJP (Automated Similarity Judgment Program)

Overall goal:Automatic reconstruction of language relationships

Basis:Distance matrix between individual languages on basis of linguistic features

ASJP: Automatic Reconstruction

7

Overview

Project: ASJP (Automated Similarity Judgment Program)

Overall goal:Automatic reconstruction of language relationships

Basis:Distance matrix between individual languages on basis of linguistic features

Method: Lexicostatistics: mass comparison of lexical items

ASJP: Automatic Reconstruction

8

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals (a.o):

ASJP: Automatic Reconstruction

9

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

ASJP: Automatic Reconstruction

10

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

ASJP: Automatic Reconstruction

11

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

ASJP: Automatic Reconstruction

12

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

ASJP: Automatic Reconstruction

13

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

ASJP: Automatic Reconstruction

14

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

- Experimentally find the best/optimal dating method

ASJP: Automatic Reconstruction

15

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

- Experimentally find the best/optimal dating method

- Detect borrowings

ASJP: Automatic Reconstruction

16

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

- Experimentally find the best/optimal dating method

- Detect borrowings

ASJP: Automatic Reconstruction

17

Overview

1. The basic list of lexical items

ASJP: Automatic Reconstruction

18

Overview

1. The basic list of lexical items

2. Comparing languages

ASJP: Automatic Reconstruction

19

Overview

1. The basic list of lexical items

2. Comparing languages

3. Some results: genetic and areal proximity

ASJP: Automatic Reconstruction

20

Overview

1. The basic list of lexical items

2. Comparing languages

3. Some results: genetic and areal proximity

4. On Inheritance vs Borrowing

ASJP: Automatic Reconstruction

21

Overview

1. The basic list of lexical items

2. Comparing languages

3. Some results: genetic and areal proximity

4. On Inheritance vs Borrowing

5. Conclusions

ASJP: Automatic Reconstruction

22

1. The basic list of lexical items

ASJP: Automatic Reconstruction

23

Lexical items

Word list: Swadesh 100 basic meanings

ASJP: Automatic Reconstruction

24

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

ASJP: Automatic Reconstruction

25

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

ASJP: Automatic Reconstruction

26

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

- Inherited rather than borrowed

ASJP: Automatic Reconstruction

27

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

- Inherited rather than borrowed

- Culturally independent

ASJP: Automatic Reconstruction

28

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

- Inherited rather than borrowed

- Culturally independent

- Stable over time

ASJP: Automatic Reconstruction

29

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

- Inherited rather than borrowed

- Culturally independent

- Stable over time

- Few synonyms

ASJP: Automatic Reconstruction

30

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

ASJP: Automatic Reconstruction

31

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

ASJP: Automatic Reconstruction

32

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

ASJP: Automatic Reconstruction

33

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

ASJP: Automatic Reconstruction

34

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon 93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water 95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

ASJP: Automatic Reconstruction

35

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

ASJP: Automatic Reconstruction

36

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

ASJP: Automatic Reconstruction

37

Lexical items: further reduction

Early analyses have shown:

- Optimal 40/100 item subset gives same results

ASJP: Automatic Reconstruction

38

Lexical items: further reduction

Early analyses have shown:

- Optimal 40/100 item subset gives same results

Less work

ASJP: Automatic Reconstruction

39

Lexical items: further reduction

Early analyses have shown:

- Optimal 40/100 item subset gives same results

Less work

Less missing data

ASJP: Automatic Reconstruction

40

Lexical items: further reduction

Early analyses have shown:

- Optimal 40/100 item subset gives same results

Less work

Less missing data

Faster processing; combinatorial explosion:

40 : 100 ~ 3 * 107 : 2 * 1010

ASJP: Automatic Reconstruction

41

Lexical items: stability

Most stable items:

ASJP: Automatic Reconstruction

42

Lexical items: stability

Most stable items:

Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005)

E.g. Germanic, Romance, Slavic, …

ASJP: Automatic Reconstruction

43

Lexical items: stability

Most stable items:

Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005)

E.g. Germanic, Romance, Slavic, …

Formula: S = (E - U)/(100 - U)(weighted average % matches Eq vs Uneq)

ASJP: Automatic Reconstruction

44

Ethnologue (Goodmann-Kruskal)

WALS (Pearson)

++ < Stability > --

ASJP: Automatic Reconstruction

45

I dog nose die smoke

you louse mouth kill fire

we tree tooth swim ash

this seed tongue fly burn

that leaf claw walk path

who root foot come mountain

what bark knee lie red

not skin hand sit green

all flesh belly stand yellow

many blood neck give white

one bone breasts say black

two grease heart sun night

big egg liver moon hot

long horn drink star cold

small tail eat water full

woman feather bite rain new

man hair see stone good

person head hear sand round

fish ear know earth dry

bird eye sleep cloud name

ASJP: Automatic Reconstruction

46

I dog nose die smoke

you louse mouth kill fire

we tree tooth swim ash

this seed tongue fly burn

that leaf claw walk path

who root foot come mountain

what bark knee lie red

not skin hand sit green

all flesh belly stand yellow

many blood neck give white

one bone breast say black

two grease heart sun night

big egg liver moon hot

long horn drink star cold

small tail eat water full

woman feather bite rain new

man hair see stone good

person head hear sand round

fish ear know earth dry

bird eye sleep cloud name

40Most

Stable

ASJP: Automatic Reconstruction

47

I dog nose die smoke

you louse mouth kill fire

we tree tooth swim ash

this seed tongue fly burn

that leaf claw walk path

who root foot come mountain

what bark knee lie red

not skin hand sit green

all flesh belly stand yellow

many blood neck give white

one bone breast say black

two grease heart sun night

big egg liver moon hot

long horn drink star cold

small tail eat water full

woman feather bite rain new

man hair see stone good

person head hear sand round

fish ear know earth dry

bird eye sleep cloud name

Homophones

ASJP: Automatic Reconstruction

48

Lexical items: transcription

First phase of project (2007):

Problems with full IPA representation of words:

ASJP: Automatic Reconstruction

49

Lexical items: transcription

First phase of project (2007):

Problems with full IPA representation of words:

- data entry via keyboard

ASJP: Automatic Reconstruction

50

Lexical items: transcription

First phase of project (2007):

Problems with full IPA representation of words:

- data entry via keyboard

- simple programming language (Fortran; Pascal)

ASJP: Automatic Reconstruction

51

Lexical items: transcription

First phase of project (2007):

Problems with full IPA representation of words:

- data entry via keyboard

- simple programming language (Fortran; Pascal)

Recoding to simplified ASJPcode (only Ascii)

ASJP: Automatic Reconstruction

52

Lexical items: transcriptionASJPcode:

ASJP: Automatic Reconstruction

53

Lexical items: transcriptionASJPcode: 7 Vowels

ASJP: Automatic Reconstruction

54

Lexical items: transcriptionASJPcode: 7 Vowels

34 Consonants

ASJP: Automatic Reconstruction

55

Lexical items: transcriptionASJPcode: 7 Vowels

34 Consonants

Operators for: NasalizationLabializationPalatalizationAspirationGlottalization

ASJP: Automatic Reconstruction

56

Lexical items: transcriptionASJPcode: 7 Vowels

34 Consonants

Operators for: NasalizationLabializationPalatalizationAspirationGlottalization

(some) complex syllables simplified (VXC VC)

ASJP: Automatic Reconstruction

57

Abaza (Caucasian):

Meaning

PERSON

LEAF

SKIN

HORN

NOSE

TOOTH

ASJP: Automatic Reconstruction

58

Abaza (Caucasian):

Meaning IPA

PERSON ʕʷɨʧʼʲʷʕʷɨs

LEAF bɣʲɨ

SKIN ʧʷazʲ

HORN ʧʼʷɨʕʷa

NOSE pɨnʦʼa

TOOTH pɨʦ

ASJP: Automatic Reconstruction

59

Abaza (Caucasian):

Meaning IPA ASJPcode

PERSON ʕʷɨʧʼʲʷʕʷɨs Xw~3Cw"yXw~3s

LEAF bɣʲɨ bxy~3

SKIN ʧʷazʲ Cw~azy~

HORN ʧʼʷɨʕʷa Cw"~3Xw~a

NOSE pɨnʦʼa p3nc"a

TOOTH pɨʦ p3c

ASJP: Automatic Reconstruction

60

Lexical items

Collected to date:

- Over 2100 languages, dialects and proto

ASJP: Automatic Reconstruction

61

Lexical items

Collected to date:

- Over 2100 languages, dialects and proto

- Mean number of items/language: 36.2 (/40)

ASJP: Automatic Reconstruction

62

Lexical items

Distribution:

Americas: 27%

Eurasia: 23%

Australia/PNG: 18%

Austronesia: 15%

Africa: 14%

Creoles: 2%

Artificial: 1%

ASJP: Automatic Reconstruction

63

Languages currently sampled

ASJP: Automatic Reconstruction

64

Lexical items: transcription

Second phase of project (2008):

Problems with full IPA representation solved:

ASJP: Automatic Reconstruction

65

Lexical items: transcription

Second phase of project (2008):

Problems with full IPA representation solved:

1. automatic conversion IPA to integer (Python)

ASJP: Automatic Reconstruction

66

Lexical items: transcription

Second phase of project (2008):

Problems with full IPA representation solved:

1. automatic conversion IPA to integer (Python)

2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

ASJP: Automatic Reconstruction

67

Lexical items: transcription

Abaza (Caucasian):

Meaning: PERSON

ASJP: Automatic Reconstruction

68

Lexical items: transcription

Abaza (Caucasian):

Meaning: PERSON

IPA: ʕʷɨʧʼʲʷʕʷɨs

ASJP: Automatic Reconstruction

69

Lexical items: transcription

Abaza (Caucasian):

Meaning: PERSON

IPA: ʕʷɨʧʼʲʷʕʷɨs

Decimal: 661 695 616 679 700 690 695 661 695 616 115

ASJP: Automatic Reconstruction

70

Lexical items: transcription

Abaza (Caucasian):

Meaning: PERSON

IPA: ʕʷɨʧʼʲʷʕʷɨs

Decimal: 661 695 616 679 700 690 695 661 695 616 115

ASJPcode: 88 119 126 51 67 34 121 119 126 88 119 126 51 115

( = Xw~3Cw"y~Xw~3s)

ASJP: Automatic Reconstruction

71

Lexical items: transcription

Second phase of project (2008):

1. automatic conversion IPA to integer (Python)

2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

Why not run on full IPA??

ASJP: Automatic Reconstruction

72

Lexical items: transcription

Second phase of project (2008):

1. automatic conversion IPA to integer (Python)

2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

- correlations IPA ~ ASJP > 0.9

ASJP: Automatic Reconstruction

73

Lexical items: transcription

Second phase of project (2008):

1. automatic conversion IPA to integer (Python)

2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

- correlations IPA ~ ASJP > 0.9- but: ASJP better fit with classifications IPA too specific

ASJP: Automatic Reconstruction

74

Lexical items: transcription

IPA: ʕʷɨʧʼʲʷʕʷɨs

Decimal: 661 695 616 679 700 690 695 661 695 616 115

ASJP++code: ( = any unicode string )

A n661, n695, n616, ……P Q A B C…Z P Q Z

formal grammar

ASJP: Automatic Reconstruction

75

Lexical items: transcription

IPA: ʕʷɨʧʼʲʷʕʷɨs

Decimal: 661 695 616 679 700 690 695 661 695 616 115

ASJP++code: ( = any unicode string )

A n661, n695, n616, ……P Q A B C…Z P Q Z

optimal levelof abstractionfor historicalphonologicalreconstruction?

ASJP: Automatic Reconstruction

76

2. Comparing languages

ASJP: Automatic Reconstruction

77

Comparing words

LG I YOU WE

ABAZA sErE w3rE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

ASJP: Automatic Reconstruction

78

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3

ASJP: Automatic Reconstruction

79

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3 LDj=4

ASJP: Automatic Reconstruction

80

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3 LDj=4 LDk=3

ASJP: Automatic Reconstruction

81

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3 LDj=4 LDk=3

ASJP: Automatic Reconstruction

82

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3 LDj=4 LDk=3LDmean=3.73

ASJP: Automatic Reconstruction

83

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=4 LDj=4 LDk=4LDmean=4.37

ASJP: Automatic Reconstruction

84

Comparing words

LG I YOU WE

ABAZA sErE w3rE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

3.73

ASJP: Automatic Reconstruction

85

Comparing words

LG I YOU WE

ABAZA sErE w3rE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

3.73

4.37

ASJP: Automatic Reconstruction

86

Comparing words

Levenshtein Distance

ASJP: Automatic Reconstruction

87

Comparing words

Levenshtein Distance

a. between 2 words:

Number of transformations to get from the shorter form to the longer one (changes, additions)

ASJP: Automatic Reconstruction

88

Comparing words

Levenshtein Distance

a. between 2 words:

Number of transformations to get from the shorter form to the longer one (changes, additions)

b. Between 2 languages:

E.g. mean LD for overlapping set (<= 40)

ASJP: Automatic Reconstruction

89

Comparing words

Levenshtein Distance

Two problems with simple LD:

ASJP: Automatic Reconstruction

90

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

ASJP: Automatic Reconstruction

91

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Normalize: LDN = ( LD / Lmax )

ASJP: Automatic Reconstruction

92

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Normalize: LDN = ( LD / Lmax )

2. Differences between lgs in phonological overlap

ASJP: Automatic Reconstruction

93

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Normalize: LDN = ( LD / Lmax )

2. Differences between lgs in phonological overlap

Eliminate ‘noise’: LDND = ( LDN / LDNdifferent

)

ASJP: Automatic Reconstruction

94

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Normalize: LDN = 100 * LDN

2. Differences between lgs in phonological overlap

Eliminate ‘noise’: LDND = 100 * LDND

ASJP: Automatic Reconstruction

95

Comparing languages

Levenshtein Distance for Language Pair

- Mean of all LDND’s of words in common

ASJP: Automatic Reconstruction

96

Comparing languages

Levenshtein Distance for Language Pair

- Mean of all LDND’s of words in common

- Synonyms (12%):- take Minimum pair- take Mean

ASJP: Automatic Reconstruction

97

Comparing languages

Levenshtein Distance for Language Pair

- Mean of all LDND’s of words in common

- Synonyms (12%):- take Minimum pair- take Mean

Experimentaloption

ASJP: Automatic Reconstruction

98

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

99

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

100

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

101

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

102

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

103

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

104

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

105

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

106

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

ASJP: Automatic Reconstruction

107

Comparing languagesLANG1 LANG2 FAM1 FAM2 LDNDFRENCH ARPITAN INDO-EUROPEAN INDO-EUROPEAN 55.63FRENCH GALICIAN INDO-EUROPEAN INDO-EUROPEAN 74.49FRENCH ARAGONESE INDO-EUROPEAN INDO-EUROPEAN 76.16FRENCH FRIULIAN INDO-EUROPEAN INDO-EUROPEAN 74.64FRENCH ROMANSH_SURSILVAN INDO-EUROPEAN INDO-EUROPEAN 77.80FRENCH ROMANIAN INDO-EUROPEAN INDO-EUROPEAN 74.37FRENCH LATIN INDO-EUROPEAN INDO-EUROPEAN 80.07FRENCH CATALAN INDO-EUROPEAN INDO-EUROPEAN 71.69FRENCH ITALIAN INDO-EUROPEAN INDO-EUROPEAN 75.91FRENCH PORTUGUESE INDO-EUROPEAN INDO-EUROPEAN 74.38FRENCH SPANISH INDO-EUROPEAN INDO-EUROPEAN 80.91FRENCH DANISH INDO-EUROPEAN INDO-EUROPEAN 93.11FRENCH BERNESE_GERMAN INDO-EUROPEAN INDO-EUROPEAN 93.18FRENCH CIMBRIAN INDO-EUROPEAN INDO-EUROPEAN 94.43FRENCH BRABANTIC INDO-EUROPEAN INDO-EUROPEAN 95.18FRENCH NORTH_FRISIAN_AMRUM INDO-EUROPEAN INDO-EUROPEAN 95.30FRENCH JAMTLANDIC INDO-EUROPEAN INDO-EUROPEAN 94.58FRENCH LIMBURGISH INDO-EUROPEAN INDO-EUROPEAN 94.78FRENCH OLD_HIGH_GERMAN INDO-EUROPEAN INDO-EUROPEAN 92.70FRENCH PLAUTDIETSCH INDO-EUROPEAN INDO-EUROPEAN 95.35FRENCH NORTHERN_LOW_SAXON INDO-EUROPEAN INDO-EUROPEAN 90.87FRENCH STELLINGWERFS INDO-EUROPEAN INDO-EUROPEAN 92.85FRENCH FRANS_VLAAMS INDO-EUROPEAN INDO-EUROPEAN 94.08

ASJP: Automatic Reconstruction

108

3. Some results: genetic and areal proximity

ASJP: Automatic Reconstruction

109

Distance Matrix (0.5 * N * (N-1))

FRE DUT GAL PRT ENG …

FRE

DUT 90.93

GAL 71.62 90.00

PRT 74.38 94.61 51.87

ENG 91.17

63.19 91.30 95.18

…< Excel file >

ASJP: Automatic Reconstruction

110

Tools for Trees

ASJP: Automatic Reconstruction

111

Tools for Trees Input file to your preferred phylogenetic

software using an editor such as TextPad (www.textpad.com)

ASJP: Automatic Reconstruction

112

Tools for Trees Input file to your preferred phylogenetic

software using an editor such as TextPad (www.textpad.com)

Run data using phylogenetic software such as SplitsTree (www.splitstree.org)

ASJP: Automatic Reconstruction

113

Tools for Trees Input file to your preferred phylogenetic

software using an editor such as TextPad (www.textpad.com)

Run data using phylogenetic software such as SplitsTree (www.splitstree.org)

Choose the most appropriate algorithm (Neighbour Joining for distance data)

ASJP: Automatic Reconstruction

114

Tools for Trees Input file to your preferred phylogenetic

software using an editor such as TextPad (www.textpad.com)

Run data using phylogenetic software such as SplitsTree (www.splitstree.org)

Choose the most appropriate algorithm (Neighbour Joining for distance data)

Prepare tree for presentation using using a tool such as the Tree Explorer of MEGA

ASJP: Automatic Reconstruction

115

SalishanLanguages

(n=30)

ASJP: Automatic Reconstruction

116

NeighborJoining

SalishanLanguages

(n=30)

ASJP: Automatic Reconstruction

117

UPGMA NeighborJoining

ASJP: Automatic Reconstruction

118

UPGMA NeighborJoining

ASJP: Automatic Reconstruction

119

NeighborJoining

NeighborJoining:

ASJP: Automatic Reconstruction

120

NeighborJoining

NeighborJoining:

- specifically meant for phylogenetic trees

ASJP: Automatic Reconstruction

121

NeighborJoining

NeighborJoining:

- specifically meant for phylogenetic trees

- takes distance as point of departure

ASJP: Automatic Reconstruction

122

NeighborJoining

NeighborJoining:

- specifically meant for phylogenetic trees

- takes distance as point of departure

- does NOT assume equal rate of change

ASJP: Automatic Reconstruction

123Mayan (n=38)

ASJP: Automatic Reconstruction

124

Calibration of Method

Calibration: best options, parameters, factors:

A. for pure classification:

ASJP: Automatic Reconstruction

125

Calibration of Method

Calibration: best options, parameters, factors:

A. for pure classification:

- existing classifications (Ethnologue; WALS; mainly the well-documented areas)

ASJP: Automatic Reconstruction

126

Calibration of Method

Calibration: best options, parameters, factors:

A. for pure classification:

- existing classifications (Ethnologue; WALS; mainly the well-documented areas)

- expert knowledge of specific areas

ASJP: Automatic Reconstruction

127

Calibration of Method

Calibration: best options, parameters, factors:

A. for pure classification:

- existing classifications (Ethnologue; WALS; mainly the well-documented areas)

- expert knowledge of specific areas

diversion ±12% niche!

ASJP: Automatic Reconstruction

128

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

ASJP: Automatic Reconstruction

129

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events:

ASJP: Automatic Reconstruction

130

Linguistically crucial events

c. 250Goths conquer Dacia split of E-W Romance

4th cIrish invade Scotland

split of Irish-Scottish Gaelic

5th c

German kingdoms in W Roman Empire

breakup of W Romance

5th cGermans invade Britain split of English-Frisian

5th-6th c

Britons flee to Brittany split of Welsh-Breton

400-600Hieroglyphic evidence Ch'olan begins to split

768-814

Name of Charlemagne attested Proto-Slavic

Date Historical event Linguistic event

ASJP: Automatic Reconstruction

131

Linguistically crucial events

c. 250Goths conquer Dacia split of E-W Romance

4th cIrish invade Scotland

split of Irish-Scottish Gaelic

5th c

German kingdoms in W Roman Empire

breakup of W Romance

5th cGermans invade Britain split of English-Frisian

5th-6th c

Britons flee to Brittany split of Welsh-Breton

400-600Hieroglyphic evidence Ch'olan begins to split

768-814

Name of Charlemagne attested Proto-Slavic

Date Historical event Linguistic event

ASJP: Automatic Reconstruction

132

Linguistically crucial events

c. 250Goths conquer Dacia split of E-W Romance

4th cIrish invade Scotland

split of Irish-Scottish Gaelic

5th c

German kingdoms in W Roman Empire

breakup of W Romance

5th cGermans invade Britain split of English-Frisian

5th-6th c

Britons flee to Brittany split of Welsh-Breton

400-600Hieroglyphic evidence Ch'olan begins to split

768-814

Name of Charlemagne attested Proto-Slavic

Date Historical event Linguistic event

ASJP: Automatic Reconstruction

133

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events

Standard formula (Swadesh):

TimeDepth = log(Similarity) / 2 log Retention

ASJP: Automatic Reconstruction

134

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events

Standard formula:

TimeDepth = log(Similarity) / 2 log Retention

ASJP: Automatic Reconstruction

135

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events

Standard formula:

TimeDepth = log(LDND) / 2 log Retention

ASJP: Automatic Reconstruction

136

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events

Standard formula:

TimeDepth = log(LDND) / 2 log Retention

ASJP: Automatic Reconstruction

137

Linguistically crucial events

Time linguistic event

LDND

Ret

1.75 split of E-W Romance 0.6753 0.73

1.65split of Irish-Scottish Gaelic 0.6687 0.72

1.55 breakup of W Romance 0.6411 0.72

1.55 split of English-Frisian 0.6574 0.71

1.50 split of Welsh-Breton 0.5705 0.75

1.40 Ch'olan begins to split 0.5369 0.76

1.21 Proto-Slavic 0.5877 0.69

MEAN: 0.73

ASJP: Automatic Reconstruction

138

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events:

- Standard formula:

TimeDepth = log(LDND) / 2 log 73

ASJP: Automatic Reconstruction

139

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events:

- Standard formula:

TimeDepth = log(LDND) / 2 log 73 < 75%

ASJP: Automatic Reconstruction

140

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events:

- Standard formula:

TimeDepth = log(LDND) / 2 log 73 < 75%

Deeper!

ASJP: Automatic Reconstruction

141

Glottochronology only?

Calibration of method:

Glottochronology: all based on lexical distance

ASJP: Automatic Reconstruction

142

Glottochronology only?

Calibration of method:

Glottochronology: all based on lexical distance

Add other linguistic domains …

ASJP: Automatic Reconstruction

143

Glottochronology only?

Calibration of method:

Glottochronology: all based on lexical distance

Add other linguistic domains …

WALS Typological database

ASJP: Automatic Reconstruction

144

Glottochronology only?

Calibration of method:

Glottochronology: all based on lexical distance

Add other linguistic domains …

WALS Typological database

Best result:

(75% 40 lex) + (25% 40 Ph/M/S features)

ASJP: Automatic Reconstruction

145

4. On Inheritance vs Borrowing

ASJP: Automatic Reconstruction

146

Inherited or borrowed?

AVAR (AVA) / AGUL (AGL)

ASJP: Automatic Reconstruction

147

Inherited or borrowed?

AVAR (AVA) / AGUL (AGL)

I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0

ASJP: Automatic Reconstruction

148

Inherited or borrowed?

AVAR (AVA) / AGUL (AGL)

I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0

6 items < 70.0

ASJP: Automatic Reconstruction

149

Inherited or borrowed?

AVAR (AVA) / AGUL (AGL)

I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0

6 items < 70.0 Genetically related !!

ASJP: Automatic Reconstruction

150

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA)

ASJP: Automatic Reconstruction

151

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

ASJP: Automatic Reconstruction

152

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

6 items < 70.0

ASJP: Automatic Reconstruction

153

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

6 items < 70.0: RELATED ???

ASJP: Automatic Reconstruction

154

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

RELATED ??? NO!!!

ASJP: Automatic Reconstruction

155

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

INDO-EUROPEAN < > AUSTRONESIAN

ASJP: Automatic Reconstruction

156

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

CHANCE?

ASJP: Automatic Reconstruction

157

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

CHANCE? ~ 5% (i.e. 1 – 2 items)

ASJP: Automatic Reconstruction

158

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

BORROWING through LANGUAGE CONTACT

ASJP: Automatic Reconstruction

159

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

ASJP: Automatic Reconstruction

160

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA:

ASJP: Automatic Reconstruction

161

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA: fam/gen= 0.24/0.82

ASJP: Automatic Reconstruction

162

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00

ASJP: Automatic Reconstruction

163

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00

phon pattern fit= 12.00 > 0.67

ASJP: Automatic Reconstruction

164

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00

phon pattern fit= 12.00 > 0.67

ASJP: Automatic Reconstruction

165

Borrowed!

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA > CHA: fam/gen= 0.24/0.82 > 0.03/0.00 phon pattern fit= 12.00 > 0.67

ASJP: Automatic Reconstruction

166

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

TWO : dos=dos * LDND= 0.0

SPA > CHA f/g= 0.62/1.00 > 0.12/0.00

swF= 100.00 > 0.22

ASJP: Automatic Reconstruction

167

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

PERSON : persona=petsona * LDND=55.3

SPA > CHA f/g= 0.20/0.64 > 0.01/0.00

swF= 32.40 > 0.13

ASJP: Automatic Reconstruction

168

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

PERSON : persona=petsona * LDND=55.3

SPA > CHA f/g= 0.20/0.64 > 0.01/0.00

swF= 32.40 > 0.13

ALT: CHA= taotao (0.41/0.00)

ASJP: Automatic Reconstruction

169

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

PERSON : persona=petsona * LDND=55.3

SPA > CHA f/g= 0.20/0.64 > 0.01/0.00

swF= 32.40 > 0.13

ALT: CHA= taotao (0.41/0.00)

ASJP: Automatic Reconstruction

170

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

STAR : estreya=estrecas * LDND=61.2

SPA > CHA f/g= 0.17/0.82 > 0.00/0.00

swF= 100.00 > 4.44

ALT: CHA= puti7on (0.03/0.00)

ASJP: Automatic Reconstruction

171

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

NIGHT : noCe=noces * LDND=68.2

SPA > CHA f/g= 0.23/0.55 > 0.04/0.00

swF= 100.00 > 0.10

ALT: CHA= pw~eNi (0.23/0.00)

ASJP: Automatic Reconstruction

172

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

NEW : nuevo=nueba * LDND=44.2

SPA > CHA f/g= 0.50/0.64 > 0.04/0.00

swF= 4.27 > 0.03

ASJP: Automatic Reconstruction

173

5. Conclusions

ASJP: Automatic Reconstruction

174

Conclusions

- Method for automatic reconstruction of language relationships

ASJP: Automatic Reconstruction

175

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

ASJP: Automatic Reconstruction

176

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

ASJP: Automatic Reconstruction

177

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

- Locate potential borrowings

ASJP: Automatic Reconstruction

178

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

- Locate potential borrowings

- C O R E: incremental lexical database (> 35%)

ASJP: Automatic Reconstruction

179

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

- Locate potential borrowings

- C O R E: incremental lexical database (> 35%)

One day: Online

ASJP: Automatic Reconstruction

180

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

- Locate potential borrowings

- C O R E: incremental lexical database (> 35%)

One day: Online

Cooperation!!

ASJP: Automatic Reconstruction

181

Holman et al. (forthc. 2008) Explorations in automated language classification. Folia Linguistica

Brown et al. (forthc. 2008) Automated Classification of the World’s languages: A description of the method and prelimary results Sprachtypologie und Universalienforschung

+ Several working papers

email.eva.mpg.de./~wichmann/ASJPHomePage

ASJP: Automatic Reconstruction

182

?

top related