1 “penuria nominum” – shortage of words knowledge beyond the capacity of language? by györgy...

20
1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data Integration: Data Management for Knowledge Discovery Ontology and Biomedical Informatics Rome 29 April – 2 May 2005

Post on 19-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

1

“Penuria nominum” – shortage of words

Knowledge beyond the capacity of language?by György Surján

ESKIHungary

Commentary to Judith Blake Beyond Data Integration: Data Management for

Knowledge Discovery

Ontology and Biomedical InformaticsRome 29 April – 2 May 2005

Page 2: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

2

Overview:

(A commentary of an outsider)

1. Modern science is analytical2. Problem of identity3. Capacity of language is limited4. Genomics and proteomics deals with extremely

large databases. 5. Ontologies are bound to reality by language tags

Page 3: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

3

1. Modern science is analytical

Page 4: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

4

Mouse and human genes agree in 90%Mouse and human body is built from rather similar building blocks.The difference is not in the building elements, but in the different way of integration of the elements.The difference is not only phenotypic, but functional:Humans are able to create such a sculpture demonstrating the beauty of human body

Page 5: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

5

Q1. Is analytical approach sufficient to explain differences of living organisms?

By changing of 10 % of its genes would a mouse be able to create sculptures like Michelangello?

Page 6: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

6

2. The problem of identity

Importance of identity in ontology:

Entities having different identity criteria can not belong to the same class.

Page 7: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

7

=

We have the strong feeling of our self-identity all over of our whole life, despite of all changes that happen to us

Page 8: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

8

=

Identity is independent from similarity and recognition

Page 9: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

9

Identity of genes or proteins

Entities may gain or loose parts without loosing their identity

Genes loosing some nucleotides are still identical?

Q2. What are the identity criteria for genes and proteins?

Page 10: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

10

Elementary particles have no identity

Humans and developed animals obviously have

Q3. At which level of organisation identity emerges?(Do biological macromolecules have identity?)

Page 11: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

11

Shepard in the 19th century ~3-400 words

Anatomy (intermediate language certificate) ~4000 terms

SNOMED 3.1, Encyclopaedia Britannica ~120 000 terms

WordNet ~150 000 strings

UMLS Metathesaurus > 1 500 000 terms

3. Capacity of language is limited

Our language capacity is huge, but nevertheless finite

Page 12: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

12

Limiting factors:

1. Capacity of human brain

2. Number of terms shared by a community

Page 13: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

13

Example of numbers

Different names for the first 13 numbers (zero- twelve) in English, then we use combinations

hundred 102

thousand 103

million 106

billion 109

…. ?

1080

We have linguistic solution to express extremely large numbers in price of precision loss

94869313860999624578839454223454292345623754278394542323452456598564789345634987

9.486 1080

Page 14: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

14

Up to now, mankind has not met any situation which could exhaust the capacity of human language, not because the number of things to be expressed were less than this capacity, but we always could find some acceptable compromise.

We do not know where are the limitations of our language capacity, but the feeling of this limitation was well known centuries ago (penuria nominum):

In the 17th century Harsdörfer proposed a machine with 5 wheels containing 256 syllables, prefixes and suffixes, beeing able to generate about 97 million (mostly nonsense) German words in order to find the real name of God and also to being able to use different names for all particualrs in the world instead of referring them by names of their classes

(U. Eco: Between La Mancha and Babel)

Page 15: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

15

Size of genomics databases:

GO ~18 000 termsHuman genome ~ 30 000 ? genes GenBank ~42 000 000 sequences

Page 16: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

16

Are we able to use 42 million names?

Q4. Is it possible to describe molecular biology using human language?

Is there any other representation tool to be used for that purpose?

Page 17: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

17

5. Ontologies are bound to reality by language tags

formal languages are used to describe structures

Page 18: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

18

language tag

ID

ID

ID

ID

ID

ID

ID

IDID

ID

ID

ID

language tag

language tag

language tag

language tag

language taglanguage tag

language tag

language tag

language tag

language tag Reality

Language

What is the meaning ?

Page 19: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

19

Q5. If language fails in genomics and proteomics, is there a need and possibility for alternative methods of ontology engineering, that does not requires language?

If ontologies are bound to reality by language, than it is hard to create (use) ontology where the problem field exceeds the capacity of language.

Page 20: 1 “Penuria nominum” – shortage of words Knowledge beyond the capacity of language? by György Surján ESKI Hungary Commentary to Judith Blake Beyond Data

20

Q4. Is it possible to describe molecular biology using human language? Is there any other representation

tool to be used for that purpose?

Q3. At which level of organisation identity emerges?(Do biological macromolecules have identity?)

Q1. Is analytical approach sufficient to explain differences of living organisms?

Q2. What are the identity criteria for genes?

Summary of questions

Q5. If language fails in genomics and proteomics, is there a need and possibility for alternative methods of ontology engineering, that does not requires language?