a dvances in automated language classification asjp consortium dik bakke r, lancaster

182
Advances in Automated Language Classification ASJP Consortium Dik Bakker, Lancaster

Upload: hedda

Post on 12-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster. Overview. Project: ASJP ( A utomated S imilarity J udgment P rogram). Overview. Project: ASJP are: S ö ren Wichmann (BRD; Netherlands) Viveka Velupillai (BRD) Andr é Müller (BRD) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

Advances inAutomatedLanguage

Classification

ASJP ConsortiumDik Bakker, Lancaster

Page 2: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

2

Overview

Project: ASJP (Automated Similarity Judgment Program)

Page 3: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

3

Overview

Project: ASJP are:

Sören Wichmann (BRD; Netherlands)Viveka Velupillai (BRD)André Müller (BRD)

Robert Mailhammer (BRD)Hagen Jung (BRD)Eric Holman (US)Anthony Grant (UK)Dmitry Egorov (Russia)Pamela Brown (US)Cecil Brown (US)Dik Bakker (UK; Netherlands)

Page 4: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

4

Overview

Project: ASJP (Automated Similarity Judgment Program)

Page 5: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

5

Overview

Project: ASJP (Automated Similarity Judgment Program)

Overall goal:Automatic reconstruction of language relationships

Page 6: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

6

Overview

Project: ASJP (Automated Similarity Judgment Program)

Overall goal:Automatic reconstruction of language relationships

Basis:Distance matrix between individual languages on basis of linguistic features

Page 7: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

7

Overview

Project: ASJP (Automated Similarity Judgment Program)

Overall goal:Automatic reconstruction of language relationships

Basis:Distance matrix between individual languages on basis of linguistic features

Method: Lexicostatistics: mass comparison of lexical items

Page 8: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

8

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals (a.o):

Page 9: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

9

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

Page 10: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

10

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

Page 11: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

11

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

Page 12: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

12

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

Page 13: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

13

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

Page 14: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

14

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

- Experimentally find the best/optimal dating method

Page 15: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

15

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

- Experimentally find the best/optimal dating method

- Detect borrowings

Page 16: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

16

Overview

MAIN GOAL: Reconstruction of Language Relationships

Derived goals:

- Critical assessment and refinement of existing classifications

- Classify newly described and unclassified languages

- Estimate time depths between languages / genera / families

- Search for (ir)regularities in phylogenies

- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)

- Experimentally find the best/optimal dating method

- Detect borrowings

Page 17: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

17

Overview

1. The basic list of lexical items

Page 18: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

18

Overview

1. The basic list of lexical items

2. Comparing languages

Page 19: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

19

Overview

1. The basic list of lexical items

2. Comparing languages

3. Some results: genetic and areal proximity

Page 20: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

20

Overview

1. The basic list of lexical items

2. Comparing languages

3. Some results: genetic and areal proximity

4. On Inheritance vs Borrowing

Page 21: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

21

Overview

1. The basic list of lexical items

2. Comparing languages

3. Some results: genetic and areal proximity

4. On Inheritance vs Borrowing

5. Conclusions

Page 22: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

22

1. The basic list of lexical items

Page 23: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

23

Lexical items

Word list: Swadesh 100 basic meanings

Page 24: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

24

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

Page 25: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

25

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

Page 26: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

26

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

- Inherited rather than borrowed

Page 27: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

27

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

- Inherited rather than borrowed

- Culturally independent

Page 28: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

28

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

- Inherited rather than borrowed

- Culturally independent

- Stable over time

Page 29: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

29

Lexical items

Word list: Swadesh 100 basic meanings

- Word coined in most languages

- Collected in field work lexicon /

grammar

- Inherited rather than borrowed

- Culturally independent

- Stable over time

- Few synonyms

Page 30: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

30

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

Page 31: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

31

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

Page 32: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

32

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

Page 33: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

33

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

Page 34: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

34

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon 93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water 95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

Page 35: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

35

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

Page 36: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

36

1. I 21. dog 41. nose 61. die 81. smoke

2. you 22. louse 42. mouth 62. kill 82. fire

3. we 23. tree 43. tooth 63. swim 83. ash

4. this 24. seed 44. tongue 64. fly 84. burn

5. that 25. leaf 45. claw 65. walk 85. path

6. who 26. root 46. foot 66. come 86. mountain

7. what 27. bark 47. knee 67. lie 87. red

8. not 28. skin 48. hand 68. sit 88. green

9. all 29. flesh 49. belly 69. stand 89. yellow

10. many 30. blood 50. neck 70. give 90. white

11. one 31. bone 51. breasts

71. say 91. black

12. two 32. grease 52. heart 72. sun 92. night

13. big 33. egg 53. liver 73. moon

93. hot

14. long 34. horn 54. drink 74. star 94. cold

15. small 35. tail 55. eat 75. water

95. full

16. woman

36. feather

56. bite 76. rain 96. new

17. man 37. hair 57. see 77. stone 97. good

18. person 38. head 58. hear 78. sand 98. round

19. fish 39. ear 59. know 79. earth 99. dry

20. bird 40. eye 60. sleep 80. cloud 100. name

Page 37: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

37

Lexical items: further reduction

Early analyses have shown:

- Optimal 40/100 item subset gives same results

Page 38: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

38

Lexical items: further reduction

Early analyses have shown:

- Optimal 40/100 item subset gives same results

Less work

Page 39: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

39

Lexical items: further reduction

Early analyses have shown:

- Optimal 40/100 item subset gives same results

Less work

Less missing data

Page 40: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

40

Lexical items: further reduction

Early analyses have shown:

- Optimal 40/100 item subset gives same results

Less work

Less missing data

Faster processing; combinatorial explosion:

40 : 100 ~ 3 * 107 : 2 * 1010

Page 41: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

41

Lexical items: stability

Most stable items:

Page 42: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

42

Lexical items: stability

Most stable items:

Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005)

E.g. Germanic, Romance, Slavic, …

Page 43: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

43

Lexical items: stability

Most stable items:

Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005)

E.g. Germanic, Romance, Slavic, …

Formula: S = (E - U)/(100 - U)(weighted average % matches Eq vs Uneq)

Page 44: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

44

Ethnologue (Goodmann-Kruskal)

WALS (Pearson)

++ < Stability > --

Page 45: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

45

I dog nose die smoke

you louse mouth kill fire

we tree tooth swim ash

this seed tongue fly burn

that leaf claw walk path

who root foot come mountain

what bark knee lie red

not skin hand sit green

all flesh belly stand yellow

many blood neck give white

one bone breasts say black

two grease heart sun night

big egg liver moon hot

long horn drink star cold

small tail eat water full

woman feather bite rain new

man hair see stone good

person head hear sand round

fish ear know earth dry

bird eye sleep cloud name

Page 46: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

46

I dog nose die smoke

you louse mouth kill fire

we tree tooth swim ash

this seed tongue fly burn

that leaf claw walk path

who root foot come mountain

what bark knee lie red

not skin hand sit green

all flesh belly stand yellow

many blood neck give white

one bone breast say black

two grease heart sun night

big egg liver moon hot

long horn drink star cold

small tail eat water full

woman feather bite rain new

man hair see stone good

person head hear sand round

fish ear know earth dry

bird eye sleep cloud name

40Most

Stable

Page 47: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

47

I dog nose die smoke

you louse mouth kill fire

we tree tooth swim ash

this seed tongue fly burn

that leaf claw walk path

who root foot come mountain

what bark knee lie red

not skin hand sit green

all flesh belly stand yellow

many blood neck give white

one bone breast say black

two grease heart sun night

big egg liver moon hot

long horn drink star cold

small tail eat water full

woman feather bite rain new

man hair see stone good

person head hear sand round

fish ear know earth dry

bird eye sleep cloud name

Homophones

Page 48: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

48

Lexical items: transcription

First phase of project (2007):

Problems with full IPA representation of words:

Page 49: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

49

Lexical items: transcription

First phase of project (2007):

Problems with full IPA representation of words:

- data entry via keyboard

Page 50: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

50

Lexical items: transcription

First phase of project (2007):

Problems with full IPA representation of words:

- data entry via keyboard

- simple programming language (Fortran; Pascal)

Page 51: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

51

Lexical items: transcription

First phase of project (2007):

Problems with full IPA representation of words:

- data entry via keyboard

- simple programming language (Fortran; Pascal)

Recoding to simplified ASJPcode (only Ascii)

Page 52: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

52

Lexical items: transcriptionASJPcode:

Page 53: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

53

Lexical items: transcriptionASJPcode: 7 Vowels

Page 54: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

54

Lexical items: transcriptionASJPcode: 7 Vowels

34 Consonants

Page 55: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

55

Lexical items: transcriptionASJPcode: 7 Vowels

34 Consonants

Operators for: NasalizationLabializationPalatalizationAspirationGlottalization

Page 56: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

56

Lexical items: transcriptionASJPcode: 7 Vowels

34 Consonants

Operators for: NasalizationLabializationPalatalizationAspirationGlottalization

(some) complex syllables simplified (VXC VC)

Page 57: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

57

Abaza (Caucasian):

Meaning

PERSON

LEAF

SKIN

HORN

NOSE

TOOTH

Page 58: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

58

Abaza (Caucasian):

Meaning IPA

PERSON ʕʷɨʧʼʲʷʕʷɨs

LEAF bɣʲɨ

SKIN ʧʷazʲ

HORN ʧʼʷɨʕʷa

NOSE pɨnʦʼa

TOOTH pɨʦ

Page 59: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

59

Abaza (Caucasian):

Meaning IPA ASJPcode

PERSON ʕʷɨʧʼʲʷʕʷɨs Xw~3Cw"yXw~3s

LEAF bɣʲɨ bxy~3

SKIN ʧʷazʲ Cw~azy~

HORN ʧʼʷɨʕʷa Cw"~3Xw~a

NOSE pɨnʦʼa p3nc"a

TOOTH pɨʦ p3c

Page 60: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

60

Lexical items

Collected to date:

- Over 2100 languages, dialects and proto

Page 61: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

61

Lexical items

Collected to date:

- Over 2100 languages, dialects and proto

- Mean number of items/language: 36.2 (/40)

Page 62: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

62

Lexical items

Distribution:

Americas: 27%

Eurasia: 23%

Australia/PNG: 18%

Austronesia: 15%

Africa: 14%

Creoles: 2%

Artificial: 1%

Page 63: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

63

Languages currently sampled

Page 64: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

64

Lexical items: transcription

Second phase of project (2008):

Problems with full IPA representation solved:

Page 65: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

65

Lexical items: transcription

Second phase of project (2008):

Problems with full IPA representation solved:

1. automatic conversion IPA to integer (Python)

Page 66: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

66

Lexical items: transcription

Second phase of project (2008):

Problems with full IPA representation solved:

1. automatic conversion IPA to integer (Python)

2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

Page 67: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

67

Lexical items: transcription

Abaza (Caucasian):

Meaning: PERSON

Page 68: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

68

Lexical items: transcription

Abaza (Caucasian):

Meaning: PERSON

IPA: ʕʷɨʧʼʲʷʕʷɨs

Page 69: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

69

Lexical items: transcription

Abaza (Caucasian):

Meaning: PERSON

IPA: ʕʷɨʧʼʲʷʕʷɨs

Decimal: 661 695 616 679 700 690 695 661 695 616 115

Page 70: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

70

Lexical items: transcription

Abaza (Caucasian):

Meaning: PERSON

IPA: ʕʷɨʧʼʲʷʕʷɨs

Decimal: 661 695 616 679 700 690 695 661 695 616 115

ASJPcode: 88 119 126 51 67 34 121 119 126 88 119 126 51 115

( = Xw~3Cw"y~Xw~3s)

Page 71: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

71

Lexical items: transcription

Second phase of project (2008):

1. automatic conversion IPA to integer (Python)

2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

Why not run on full IPA??

Page 72: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

72

Lexical items: transcription

Second phase of project (2008):

1. automatic conversion IPA to integer (Python)

2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

- correlations IPA ~ ASJP > 0.9

Page 73: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

73

Lexical items: transcription

Second phase of project (2008):

1. automatic conversion IPA to integer (Python)

2. (semi-)automatic recoding to ASJPcode: transduction on the basis of a formal grammar

- correlations IPA ~ ASJP > 0.9- but: ASJP better fit with classifications IPA too specific

Page 74: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

74

Lexical items: transcription

IPA: ʕʷɨʧʼʲʷʕʷɨs

Decimal: 661 695 616 679 700 690 695 661 695 616 115

ASJP++code: ( = any unicode string )

A n661, n695, n616, ……P Q A B C…Z P Q Z

formal grammar

Page 75: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

75

Lexical items: transcription

IPA: ʕʷɨʧʼʲʷʕʷɨs

Decimal: 661 695 616 679 700 690 695 661 695 616 115

ASJP++code: ( = any unicode string )

A n661, n695, n616, ……P Q A B C…Z P Q Z

optimal levelof abstractionfor historicalphonologicalreconstruction?

Page 76: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

76

2. Comparing languages

Page 77: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

77

Comparing words

LG I YOU WE

ABAZA sErE w3rE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

Page 78: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

78

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3

Page 79: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

79

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3 LDj=4

Page 80: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

80

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3 LDj=4 LDk=3

Page 81: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

81

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3 LDj=4 LDk=3

Page 82: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

82

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=3 LDj=4 LDk=3LDmean=3.73

Page 83: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

83

Comparing words

LG I YOU WE

ABAZA sErE bErE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

LDi=4 LDj=4 LDk=4LDmean=4.37

Page 84: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

84

Comparing words

LG I YOU WE

ABAZA sErE w3rE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

3.73

Page 85: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

85

Comparing words

LG I YOU WE

ABAZA sErE w3rE Sw~ErE

ABKHAZ s3 w3 Sw~3

AGUL zun wun cw~un

3.73

4.37

Page 86: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

86

Comparing words

Levenshtein Distance

Page 87: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

87

Comparing words

Levenshtein Distance

a. between 2 words:

Number of transformations to get from the shorter form to the longer one (changes, additions)

Page 88: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

88

Comparing words

Levenshtein Distance

a. between 2 words:

Number of transformations to get from the shorter form to the longer one (changes, additions)

b. Between 2 languages:

E.g. mean LD for overlapping set (<= 40)

Page 89: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

89

Comparing words

Levenshtein Distance

Two problems with simple LD:

Page 90: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

90

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Page 91: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

91

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Normalize: LDN = ( LD / Lmax )

Page 92: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

92

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Normalize: LDN = ( LD / Lmax )

2. Differences between lgs in phonological overlap

Page 93: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

93

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Normalize: LDN = ( LD / Lmax )

2. Differences between lgs in phonological overlap

Eliminate ‘noise’: LDND = ( LDN / LDNdifferent

)

Page 94: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

94

Comparing words

Levenshtein Distance

Two problems:

1. Value depends on length of longest word

Normalize: LDN = 100 * LDN

2. Differences between lgs in phonological overlap

Eliminate ‘noise’: LDND = 100 * LDND

Page 95: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

95

Comparing languages

Levenshtein Distance for Language Pair

- Mean of all LDND’s of words in common

Page 96: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

96

Comparing languages

Levenshtein Distance for Language Pair

- Mean of all LDND’s of words in common

- Synonyms (12%):- take Minimum pair- take Mean

Page 97: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

97

Comparing languages

Levenshtein Distance for Language Pair

- Mean of all LDND’s of words in common

- Synonyms (12%):- take Minimum pair- take Mean

Experimentaloption

Page 98: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

98

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 99: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

99

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 100: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

100

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 101: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

101

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 102: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

102

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 103: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

103

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"EyEr * LDND=55.0 ALT: AGL= c"ayif

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 104: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

104

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 105: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

105

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 106: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

106

Comparing languagesAVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC) / AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0 ALT: AGL= ac"ar NEW : c"iya=c"ayif * LDND=55.0 ALT: AGL= c"EyEr

COMMON (LDND < 70) = AGL - AVA 6 (=15.8% of 38)

LD = 4.01 / LDN = 81.76 / LDND = 89.87

Page 107: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

107

Comparing languagesLANG1 LANG2 FAM1 FAM2 LDNDFRENCH ARPITAN INDO-EUROPEAN INDO-EUROPEAN 55.63FRENCH GALICIAN INDO-EUROPEAN INDO-EUROPEAN 74.49FRENCH ARAGONESE INDO-EUROPEAN INDO-EUROPEAN 76.16FRENCH FRIULIAN INDO-EUROPEAN INDO-EUROPEAN 74.64FRENCH ROMANSH_SURSILVAN INDO-EUROPEAN INDO-EUROPEAN 77.80FRENCH ROMANIAN INDO-EUROPEAN INDO-EUROPEAN 74.37FRENCH LATIN INDO-EUROPEAN INDO-EUROPEAN 80.07FRENCH CATALAN INDO-EUROPEAN INDO-EUROPEAN 71.69FRENCH ITALIAN INDO-EUROPEAN INDO-EUROPEAN 75.91FRENCH PORTUGUESE INDO-EUROPEAN INDO-EUROPEAN 74.38FRENCH SPANISH INDO-EUROPEAN INDO-EUROPEAN 80.91FRENCH DANISH INDO-EUROPEAN INDO-EUROPEAN 93.11FRENCH BERNESE_GERMAN INDO-EUROPEAN INDO-EUROPEAN 93.18FRENCH CIMBRIAN INDO-EUROPEAN INDO-EUROPEAN 94.43FRENCH BRABANTIC INDO-EUROPEAN INDO-EUROPEAN 95.18FRENCH NORTH_FRISIAN_AMRUM INDO-EUROPEAN INDO-EUROPEAN 95.30FRENCH JAMTLANDIC INDO-EUROPEAN INDO-EUROPEAN 94.58FRENCH LIMBURGISH INDO-EUROPEAN INDO-EUROPEAN 94.78FRENCH OLD_HIGH_GERMAN INDO-EUROPEAN INDO-EUROPEAN 92.70FRENCH PLAUTDIETSCH INDO-EUROPEAN INDO-EUROPEAN 95.35FRENCH NORTHERN_LOW_SAXON INDO-EUROPEAN INDO-EUROPEAN 90.87FRENCH STELLINGWERFS INDO-EUROPEAN INDO-EUROPEAN 92.85FRENCH FRANS_VLAAMS INDO-EUROPEAN INDO-EUROPEAN 94.08

Page 108: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

108

3. Some results: genetic and areal proximity

Page 109: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

109

Distance Matrix (0.5 * N * (N-1))

FRE DUT GAL PRT ENG …

FRE

DUT 90.93

GAL 71.62 90.00

PRT 74.38 94.61 51.87

ENG 91.17

63.19 91.30 95.18

…< Excel file >

Page 110: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

110

Tools for Trees

Page 111: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

111

Tools for Trees Input file to your preferred phylogenetic

software using an editor such as TextPad (www.textpad.com)

Page 112: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

112

Tools for Trees Input file to your preferred phylogenetic

software using an editor such as TextPad (www.textpad.com)

Run data using phylogenetic software such as SplitsTree (www.splitstree.org)

Page 113: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

113

Tools for Trees Input file to your preferred phylogenetic

software using an editor such as TextPad (www.textpad.com)

Run data using phylogenetic software such as SplitsTree (www.splitstree.org)

Choose the most appropriate algorithm (Neighbour Joining for distance data)

Page 114: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

114

Tools for Trees Input file to your preferred phylogenetic

software using an editor such as TextPad (www.textpad.com)

Run data using phylogenetic software such as SplitsTree (www.splitstree.org)

Choose the most appropriate algorithm (Neighbour Joining for distance data)

Prepare tree for presentation using using a tool such as the Tree Explorer of MEGA

Page 115: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

115

SalishanLanguages

(n=30)

Page 116: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

116

NeighborJoining

SalishanLanguages

(n=30)

Page 117: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

117

UPGMA NeighborJoining

Page 118: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

118

UPGMA NeighborJoining

Page 119: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

119

NeighborJoining

NeighborJoining:

Page 120: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

120

NeighborJoining

NeighborJoining:

- specifically meant for phylogenetic trees

Page 121: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

121

NeighborJoining

NeighborJoining:

- specifically meant for phylogenetic trees

- takes distance as point of departure

Page 122: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

122

NeighborJoining

NeighborJoining:

- specifically meant for phylogenetic trees

- takes distance as point of departure

- does NOT assume equal rate of change

Page 123: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

123Mayan (n=38)

Page 124: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

124

Calibration of Method

Calibration: best options, parameters, factors:

A. for pure classification:

Page 125: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

125

Calibration of Method

Calibration: best options, parameters, factors:

A. for pure classification:

- existing classifications (Ethnologue; WALS; mainly the well-documented areas)

Page 126: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

126

Calibration of Method

Calibration: best options, parameters, factors:

A. for pure classification:

- existing classifications (Ethnologue; WALS; mainly the well-documented areas)

- expert knowledge of specific areas

Page 127: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

127

Calibration of Method

Calibration: best options, parameters, factors:

A. for pure classification:

- existing classifications (Ethnologue; WALS; mainly the well-documented areas)

- expert knowledge of specific areas

diversion ±12% niche!

Page 128: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

128

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

Page 129: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

129

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events:

Page 130: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

130

Linguistically crucial events

c. 250Goths conquer Dacia split of E-W Romance

4th cIrish invade Scotland

split of Irish-Scottish Gaelic

5th c

German kingdoms in W Roman Empire

breakup of W Romance

5th cGermans invade Britain split of English-Frisian

5th-6th c

Britons flee to Brittany split of Welsh-Breton

400-600Hieroglyphic evidence Ch'olan begins to split

768-814

Name of Charlemagne attested Proto-Slavic

Date Historical event Linguistic event

Page 131: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

131

Linguistically crucial events

c. 250Goths conquer Dacia split of E-W Romance

4th cIrish invade Scotland

split of Irish-Scottish Gaelic

5th c

German kingdoms in W Roman Empire

breakup of W Romance

5th cGermans invade Britain split of English-Frisian

5th-6th c

Britons flee to Brittany split of Welsh-Breton

400-600Hieroglyphic evidence Ch'olan begins to split

768-814

Name of Charlemagne attested Proto-Slavic

Date Historical event Linguistic event

Page 132: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

132

Linguistically crucial events

c. 250Goths conquer Dacia split of E-W Romance

4th cIrish invade Scotland

split of Irish-Scottish Gaelic

5th c

German kingdoms in W Roman Empire

breakup of W Romance

5th cGermans invade Britain split of English-Frisian

5th-6th c

Britons flee to Brittany split of Welsh-Breton

400-600Hieroglyphic evidence Ch'olan begins to split

768-814

Name of Charlemagne attested Proto-Slavic

Date Historical event Linguistic event

Page 133: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

133

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events

Standard formula (Swadesh):

TimeDepth = log(Similarity) / 2 log Retention

Page 134: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

134

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events

Standard formula:

TimeDepth = log(Similarity) / 2 log Retention

Page 135: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

135

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events

Standard formula:

TimeDepth = log(LDND) / 2 log Retention

Page 136: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

136

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events

Standard formula:

TimeDepth = log(LDND) / 2 log Retention

Page 137: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

137

Linguistically crucial events

Time linguistic event

LDND

Ret

1.75 split of E-W Romance 0.6753 0.73

1.65split of Irish-Scottish Gaelic 0.6687 0.72

1.55 breakup of W Romance 0.6411 0.72

1.55 split of English-Frisian 0.6574 0.71

1.50 split of Welsh-Breton 0.5705 0.75

1.40 Ch'olan begins to split 0.5369 0.76

1.21 Proto-Slavic 0.5877 0.69

MEAN: 0.73

Page 138: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

138

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events:

- Standard formula:

TimeDepth = log(LDND) / 2 log 73

Page 139: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

139

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events:

- Standard formula:

TimeDepth = log(LDND) / 2 log 73 < 75%

Page 140: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

140

Calibration of Method

Calibration: best options, parameters, factors:

B. for dating:

- linguistically crucial historic events:

- Standard formula:

TimeDepth = log(LDND) / 2 log 73 < 75%

Deeper!

Page 141: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

141

Glottochronology only?

Calibration of method:

Glottochronology: all based on lexical distance

Page 142: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

142

Glottochronology only?

Calibration of method:

Glottochronology: all based on lexical distance

Add other linguistic domains …

Page 143: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

143

Glottochronology only?

Calibration of method:

Glottochronology: all based on lexical distance

Add other linguistic domains …

WALS Typological database

Page 144: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

144

Glottochronology only?

Calibration of method:

Glottochronology: all based on lexical distance

Add other linguistic domains …

WALS Typological database

Best result:

(75% 40 lex) + (25% 40 Ph/M/S features)

Page 145: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

145

4. On Inheritance vs Borrowing

Page 146: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

146

Inherited or borrowed?

AVAR (AVA) / AGUL (AGL)

Page 147: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

147

Inherited or borrowed?

AVAR (AVA) / AGUL (AGL)

I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0

Page 148: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

148

Inherited or borrowed?

AVAR (AVA) / AGUL (AGL)

I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0

6 items < 70.0

Page 149: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

149

Inherited or borrowed?

AVAR (AVA) / AGUL (AGL)

I : dun=zun * LDND=36.6YOU : mun=wun * LDND=36.6 HORN : tLar=k"arC * LDND=66.0FIRE : c"a=c"a * LDND= 0.0FULL : c"ura=ac"uf * LDND=66.0NEW : c"iya=c"EyEr * LDND=55.0

6 items < 70.0 Genetically related !!

Page 150: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

150

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA)

Page 151: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

151

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

Page 152: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

152

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

6 items < 70.0

Page 153: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

153

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

6 items < 70.0: RELATED ???

Page 154: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

154

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

RELATED ??? NO!!!

Page 155: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

155

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

INDO-EUROPEAN < > AUSTRONESIAN

Page 156: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

156

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

CHANCE?

Page 157: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

157

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

CHANCE? ~ 5% (i.e. 1 – 2 items)

Page 158: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

158

Inherited or borrowed?

SPANISH (SPA) / CHAMORRO (CHA) ONE : uno=unu * LDND=36.9 TWO : dos=dos * LDND= 0.0 PERSON : persona=petsona * LDND=55.3 STAR : estreya=estrecas * LDND=61.2 NIGHT : noCe=noces * LDND=68.2 NEW : nuevo=nueba * LDND=44.2

BORROWING through LANGUAGE CONTACT

Page 159: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

159

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

Page 160: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

160

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA:

Page 161: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

161

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA: fam/gen= 0.24/0.82

Page 162: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

162

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00

Page 163: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

163

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00

phon pattern fit= 12.00 > 0.67

Page 164: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

164

Inherited or borrowed?

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA <> CHA: fam/gen= 0.24/0.82 > 0.03/0.00

phon pattern fit= 12.00 > 0.67

Page 165: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

165

Borrowed!

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

ONE : uno=unu * LDND=36.9

SPA > CHA: fam/gen= 0.24/0.82 > 0.03/0.00 phon pattern fit= 12.00 > 0.67

Page 166: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

166

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

TWO : dos=dos * LDND= 0.0

SPA > CHA f/g= 0.62/1.00 > 0.12/0.00

swF= 100.00 > 0.22

Page 167: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

167

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

PERSON : persona=petsona * LDND=55.3

SPA > CHA f/g= 0.20/0.64 > 0.01/0.00

swF= 32.40 > 0.13

Page 168: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

168

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

PERSON : persona=petsona * LDND=55.3

SPA > CHA f/g= 0.20/0.64 > 0.01/0.00

swF= 32.40 > 0.13

ALT: CHA= taotao (0.41/0.00)

Page 169: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

169

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

PERSON : persona=petsona * LDND=55.3

SPA > CHA f/g= 0.20/0.64 > 0.01/0.00

swF= 32.40 > 0.13

ALT: CHA= taotao (0.41/0.00)

Page 170: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

170

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

STAR : estreya=estrecas * LDND=61.2

SPA > CHA f/g= 0.17/0.82 > 0.00/0.00

swF= 100.00 > 4.44

ALT: CHA= puti7on (0.03/0.00)

Page 171: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

171

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

NIGHT : noCe=noces * LDND=68.2

SPA > CHA f/g= 0.23/0.55 > 0.04/0.00

swF= 100.00 > 0.10

ALT: CHA= pw~eNi (0.23/0.00)

Page 172: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

172

Borrowing

SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE

/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO

NEW : nuevo=nueba * LDND=44.2

SPA > CHA f/g= 0.50/0.64 > 0.04/0.00

swF= 4.27 > 0.03

Page 173: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

173

5. Conclusions

Page 174: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

174

Conclusions

- Method for automatic reconstruction of language relationships

Page 175: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

175

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

Page 176: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

176

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

Page 177: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

177

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

- Locate potential borrowings

Page 178: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

178

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

- Locate potential borrowings

- C O R E: incremental lexical database (> 35%)

Page 179: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

179

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

- Locate potential borrowings

- C O R E: incremental lexical database (> 35%)

One day: Online

Page 180: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

180

Conclusions

- Method for automatic reconstruction of language relationships

- Assess, discuss and correct existing classifications

- Test hypotheses about genetic distances in time

- Locate potential borrowings

- C O R E: incremental lexical database (> 35%)

One day: Online

Cooperation!!

Page 181: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

181

Holman et al. (forthc. 2008) Explorations in automated language classification. Folia Linguistica

Brown et al. (forthc. 2008) Automated Classification of the World’s languages: A description of the method and prelimary results Sprachtypologie und Universalienforschung

+ Several working papers

email.eva.mpg.de./~wichmann/ASJPHomePage

Page 182: A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

ASJP: Automatic Reconstruction

182

?