network services for biologists in the genome era the work of the european bioinformatics institute

24
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute

Upload: corey-fisher

Post on 28-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Network Services for Biologists in the Genome Era

The Work of the European Bioinformatics Institute

Our Genometcaattctga tcgaataaac gaatttacat atttggtaag ttttggccaa tttcgtagca 60 atatgatgaa attgcgctct tttttaggaa tatcaaattg gaatataaca aaaaaaaaac 120 tgaaactaac caactgaatc taatgtgcat tttaaataat aaaaatggat cattttatac 180 atcatattaa aattaaaaaa atttcataaa aataatacgt agtaaaaaat aaaaattttt 240

aacataaata aannnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 300

MTERENNVYK AKLAEQAERY DEMVEAMKKV ASMDVELTVE ERNLLSVAYK NVIGARRASW RIITSIEQKE ENKGAEEKLE MIKTYRGQVE KELRDICSDI LNVLEKHLIP CATSGESKVF YYKMKGDYHR YLAEFATGSD RKDAAENSLI AYKAASDIAM NDLPPTHPIR LGLALNFSVF YYEILNSPDR ACRLAKAAFD DAIAELDTLS EESYKDSTLI MQLLRDNLTL WTSDMQAEDP NAGDGEPKEQ IQDVEDQDVS

Chr.22

34566830 basepairsestimated 50,000-100,000 genes (3286 Mbases) 2/3 of which are

completed and in the public domain.

...others...

From: Genome MOT at the EBI (April 2000)

Data growth (EMBL DB)

Activity Areas at EBI• EMBL

– Archiving, development and distribution of DNA sequence data.

• Swiss-Prot– Archiving, production, development and distribution of Protein sequence data.

• MSD– Archiving and distribution of macromolecular structural data and structure prediction applications.

• DALI– Archiving and distribution of 2D/3D prediction databases and tools for their usage.

• ENSEMBL– Archiving, automatic analysis and distribution of Human genome data.

• CGG– Genome annotation, data mining, methabolic pathways research.

• CORBA– Design and implementation of CORBA-based tools for database querying

• SRS– Development and maintenance of SRS in collab.with Lionbiosciences.

• Industry– Links to industry and customised R&D (e.g. Gene Expression).

• External Services– Development and deployment of on-line interactive and non-interactive tools for sequence analysis.

EBI’s Network Serviceshttp://www.ebi.ac.uk/Tools/

Type Interactive Non-Interactive

Search andRetrieve

SRS (over 100databases),

CORBA Tools

(Various frontends) [email protected]

Comparison Nucleotide andprotein sequence

searches

Fasta3, WU-Blast2, NCBI

Blast2

Blitz: (S&W)Bic_sw, MPsrch,

Scanps

Fasta3BlitzBlast

On-line analysis

Ensembl, Interproand advanced

Motiv andFingerprintsearches.

Structure andGene Prediction

Sequence Aligning(clustalw_mp)

Corba ToolsInterproscan@ebi.

ac.uk

Archivedistribution ftp CD

Data Submission WEBIN [email protected]

[email protected]

Our common user interface

srs.ebi.ac.ukSequence Retrieval SystemSequence Retrieval System

Core text search andretrieval engine for mostservices offered from EBI.

Updates and links togethermore than 100 databanks.

Biggest SRS server in theworld (over 130 databases).

Genome & Proteomes

Currently more that 30 complete genomes and proteomesare available interactively to the user community and demand for data from the Human genome is being met by providing access the all the material available in the EBI databases.

GPCRDB

A recent initiative: Ensembl

Bringing discovery to the scientific community ...

The Community

• EMBnet - European Molecular Biology network.

• Formed (officially) in 1988 to disseminate up-to-date molecular biology databases within member states.

• The initiative for the creation of EMBnet was started by EMBO council members in collaboration with EMBL staff in 1986.

Dissemination of EBI data resources to the world through the EMBnet

An EMBnet node (1/2)

• Hosted by a national academic centre.

• Has national coverage over the Internet.

• Provides services to academics as well as industry (ca. 2000 users per node).

• Maintains local copies of the mayor biological databases and sequence analysis packages.

An EMBnet node (2/2)

• Provides training and education in the national language.

• Each node typically employs 2-3 staff.

• Each node has at least one major interactive login server and a WWW and ftp server (ca. 300 hosts today).

EMBnet organisation and main tasks ano 2K

E du ca tio n & T ra in ing R e se a rch & D ev e lo p m e ntT ech n ica l M an ag er

P ub lic re la tionsE M B n et ne w s

E xecu tiv e B oa rd

EMBnet membership

EMBnet Milestones (1/2)

• Development of network based tools for database updates:– 1987 - First transaction of sequence data between EMBL in

Heidelberg and InfoBiogen in Paris.– 1991 - First implementation of the HASSLE protocol

between Norway, Switzerland and Italy. Asynchronous sequence database updates where then possible.

– 1993 - First implementation of xNDT between Norway and Sweden. Asynchronous sequence database updates…another solution.

– 1994 - First implementation of SynChron which runs from EBI today on several industrial sites.

EMBnet Milestones (2/2)

• Financial– 1990 - The European Union grants support for EMBnet for

the first time.• Organisation

– 1994 - The Stichting EMBnet is formed granting EMBnet independence from any mayor member (e.g. EMBL).

• Software– 1996 - SRS development was partly financed by EMBnet.– 1998 - EMBOSS (under a GPL) is developed by EMBnet

members.

Latest News

• New MPsrch - Fastest Smith & Waterman searches in the world (1.6 billion cell updates/sec) …available soon.

• Ensembl - fast delivery of newly predicted human genes and gene products into the public domain and access to similarity and homology searches on up-to-date data sets.

• Pre-calculated proteomic comparisons of genomes through InterPro.

• EST clustering, clean-up and redundancy reduction via the EuroGene Index.

Some facts and figures…(1/2)

• EMBL-EBI is the main provider of biology related sequence databases in Europe.– Sequence Databases (EMBL, TrEMBL, Ensembl

(The Human Genome), etc.)– Cartographic Databases (RHdb)– Mutation Databases (HGBASE)– 3D/2D Structure Databases (PDB, DSSP, etc.)

Some facts and figures….(2/2)

• EMBL-EBI produces more than 50 biological databases.

• EMBL-EBI handles ca. 100K request/day on www.ebi.ac.uk and 170K requests/day on srs.ebi.ac.uk. (8M req./month) increasing at a rate of 15%/month.

• EMBL-EBI is moving more than 200Gb of data across the European networks each month.

Main usage is...

• Sequence querying and retrieval.

• Sequence comparison and searching.

• File distribution through ftp.ebi.ac.uk.

• Replication of data at many international sites (e.g. EMBnet nodes).

• Systematic use of e-mail based services.

Contacts

• EMBnet: www.embnet.org• EBI: www.ebi.ac.uk, corba.ebi.ac.uk, msd.ebi.ac.uk,

fly.ebi.ac.uk, industry.ebi.ac.uk, interpro.ebi.ac.uk, etc.• Ensembl: www.ensembl.org• EMBL: www.embl-heidelberg.de• [email protected]@ebi.ac.uk