e-bioscieprints.rclis.org/4530/2/grivell.pdf · e-biosci – distinguishing features • is a free...

20
E-BioSci Semantic networks of biological information Les Grivell European Molecular Biology Organisation

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

E-BioSciSemantic networks of biological information

Les GrivellEuropean Molecular Biology Organisation

Page 2: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

“Biological research has reached a point where new generalizations and higher order biological laws are being approached, but may be obscured by the simple mass of data”

Harold Morowitz, 1985Report to the U.S. National Academy of Sciences

Page 3: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

One part of the information explosion ….

0.00E+00

2.00E+09

4.00E+09

6.00E+09

8.00E+09

1.00E+10

1.20E+10

1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Ye a r

Various mic robia l genome s

Ye ast ge nome (14 Mbp)

C. e le gans genome (97 M bp)

Drosophila genome (137 M bp)

Huma n chr . 22 (34.5 M bp)

Arabidopsis (125.4 M bp)

Huma n c omple t e dra ft ( 3.1 G bp)

Morowitz

Page 4: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Sequences are not the only form of digital information

Page 5: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

• driving dramatic changes in the way research is conducted and in the way knowledge is being generated

• exposing limitations in current mechanisms for disseminating and sharing information

• driving changes in publication strategies

The digital revolution is ….

Page 6: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

The knowledge cycle (traditional)

Hypothesis

Idea!

Experiment

Data

Publication

Literature

Page 7: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

The knowledge cycle (extended)

Hypothesis

Idea!

Experiment

Data

Publication

(e-)Literature

Databases

Page 8: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Biological information: current reality …..

• Hundreds of different databases, many in flat-file format

• Non-uniform or lack of external identifiers• Lack of interoperability at the level of syntax and

semantics• Knowledge scattered across the literature in many

thousands of non-computer readable journal articles

Page 9: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Information retrieval from text

• There is increasing need for the use of literature as a computer-readable resource (intelligent search / retrieval)

• There is a need to apply computational methods to text as well as data analysis (mining / analysis; literature as discovery resource)

• There is a need for new methods of integration and visualization of this information

• There is a need for a scale-up in the rate of curated database growth through integration with the literature

Page 10: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Dealing with the data deluge: what do researchers need?

• Intuitive interfaces to web resources• Advanced search / retrieval facilities• Smooth navigation from one resource to another• Immediate availability of authoritative

information; free of charge at the point-of-use• Information that can be integrated / manipulated /

visualized /output in another form

Page 11: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

A new information service for the life sciences that will interlink factual and image data repositories with the research literature

EU Quality of Life research infrastructure:platform under construction

Closely linked to ORIEL, an EU-funded research project in information technology

E-BioSci

Page 12: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

• Distributed network of information resources• Europe-based; world-wide role

The current E-BioSci - ORIEL partnership

The partnership

Page 13: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

The platform

• Services freely accessible to the academic research community

• Set of distributed resources (literature, sequence- and image- databases)

• Full-text search– across document repositories

– using cross-language queries (e.g. English – French, - German etc)

– 2-way navigation links between literature and molecular datasets via gene symbol recognition

Features implemented via conceptual fingerprinting

A discovery tool

Page 14: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Conceptual fingerprints

Fingerprint database

C19881 0.99C92992 0.67C02002 0.66C99229 0.44C00392 0.33C93939 0.21

Index and link index terms to thesaurus

Full text document

•1 CFP = 400 bytes•Abstraction: 250.000 pages/PC/day•Matching: 500.000 CFP’s: 40 millisec.

Page 15: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

E-BioSci – distinguishing features

• Is a free academic access to services• Places strong emphasis on concept searching of

full text – a discovery tool• Uses conceptual fingerprints to semantically link

text with different data types (in particular genomic and image data)

• Links only to refereed material that meets criteria of editorial control

• Welcomes principles of free access, but respects existing restrictions of (commercial) content providers

Page 16: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Community-driven e-science (1)

Hypothesis

Idea!

Experiment

Data

Publication

Literature

Traditional

Hypothesis

Idea!

Experiment

Data

Publication

(e-)Literature

Databases

Extended

•The growing importance of interactive databases

Page 17: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Community-driven e-science (2)

Hypotheses

Community ‘data-centric’

Community;quality control

• Large-scale projects will drive further changes in communication and publishing practice

Hypothesis

Idea!

Experiment

Data

Publication

(e-)Literature

Databases

‘Traditional’

Experiments

Shared dataresources

Ideas!

Page 18: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Community-driven e-science (3)•Proliferation of community knowledge networks raises new challenges of semantic interconnectivity

Shared dataresources

Shared dataresources

Shared dataresources

Shared dataresources

Shared dataresources

Page 19: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

E-BioSci and semantic interconnectionof searchable resources

LiteratureDatabase

annotations(sequences,images etc)

Scientistprofiles

FingerprintDatabase(s)

Fingerprint collections

Openarchive

repositories

Communityresources

Page 20: E-BioScieprints.rclis.org/4530/2/grivell.pdf · E-BioSci – distinguishing features • Is a free academic access to services • Places strong emphasis on concept searching of full

Acknowledgements

• Frank Gannon, Executive Director EMBO• … and many others who contributed ideas to the

concept of E-BioSci• The E-BioSci partners• European Commission

(contracts QLRI-2001-30266 and IST-2001-32688)