search engines: shapes and sizes - german cancer … · search engines: shapes and sizes ... •...
TRANSCRIPT
1
7KH ,QWHUQHW
HUSAR
The Internet
Search engines
Biosci Newsgroups
Important entry sites
Link collections
Introduction
Comparison HUSAR / Internet
HUSAR
Introduction
All the indicated links can be found at: genome.dkfz-heidelberg.de
It is impossible to present all the resources on theentire WWW
Interesting sites are added every week
Our object for this session• Not primarily where to get information but:• How to get to information (search strategies)
The ‘best’ method doesn’t exist; therefore we present a personal view !
As the internet changes and grows, many interesting sites may be boring tomorrow
HUSAR
Several basic sources of information
Search enginesMetasearch engines
Search enginesMetasearch engines
Homepages / portalsHomepages / portalsMedlineElectronic journalsMedlineElectronic journals
Newsgoups (BIOSCI)Newsgoups (BIOSCI)
White listsWhite lists
HUSAR
The internet is a rich source of information
Search enginesMetasearch engines
Search enginesMetasearch engines
MedlineElectronic journals
MedlineElectronic journals
Newsgoups (BIOSCI)Newsgoups (BIOSCI)
I need info on Dr. Complexname.I need info on baldness.
How do I purify DNA from hair ?Is there a database of hair growthrelated proteins ?
I need info on Dr. Smith,member of National BaldnessSociety.I need info on baldness relateddiseases/syndromes
... But you have to combine the right question with the right source !
Homepages / portalsHomepages / portals
I need info on Dr. Smith, working at Baldness UniversityWhite listsWhite lists
HUSAR
The Internet
Search engines
Biosci Newsgroups
Important entry sites
Link collections
Introduction
Comparison HUSAR / InternetHUSAR
2
Search engines: shapes and sizes
• AltaVista and Northern Light are two of the largest search engines on the web.• FAST Search aims to index the entire web.• Excite is a medium-sized index but uses concept searching.• Companies can pay money to GoTo to be placed higher in the search results.• Google is a search engine that makes use of link popularity to rank web sites.• Yahoo is the largest human-compiled directory to the web, employs 150 editors• Specialized search engines: Biofinder, www.biologie.de, BioHunt, Pasteur NetBook
Multiple search engines query several other search engines in parallel• Examples: Metacrawler, DogPile, MetaFind, Cyber 411, Savvysearch
cuiwww.unige.ch/meta-index.html www.monash.com/spidap4.htmlwww.library.carleton.edu/staff/terry/websearch/
cuiwww.unige.ch/meta-index.html www.monash.com/spidap4.htmlwww.library.carleton.edu/staff/terry/websearch/
HUSAR
Do not rely on just one search engine
TMHMM Krogh TMHMM HUSAR Venter
&Krogh
Yahoo 15 2/0 4 5/0 4/0
FAST 99 23870/1 23 3965/0 56000/>1
Altavista 41 19186/? 19257/15 3946/>1 26000/>1
Excite 40/30 90/2 50/2 60/2 >150/3
Metacrawler 20 44/1 11 21/1 30/3
Total hits: blue Relevant hits: redSearch engines may employ different AND / OR rules
HUSAR
Comparison of search engines
NorthernLight AltaVista ExciteINKtomi GOOgle InfoSeek Lycos
YaHoo MicroSoft NetScapehttp://searchenginewatch.com
http://searchenginewatch.comhttp://searchenginewatch.com
HUSAR
The Internet
Search engines
Biosci Newsgroups
Important entry sites
Link collections
Introduction
Comparison HUSAR / InternetHUSAR
Usenet Biosci Newsgroups
Disadvantages• traffic can be too high or too low• resources are scientists• spam !
Objectto organize discussions on
a large variety of topics
Advantages• simple to complex questions
• resources are scientists• Netscape and newslist format
HUSAR
Usenet Biosci Newsgroups
HUSAR
3
Usenet Biosci Newsgroups
Access over a newsreader(e.g. Pine) is also very
convenient. Mailing lists orreading by Deja Vu is also
possible.
Instructions on how toinstall Usenet newsgroups
are provided bywww.bio.net
HUSAR
The Internet
Search engines
Biosci Newsgroups
Important entry sites
Link collections
Introduction
Comparison HUSAR / InternetHUSAR
Useful link collections
Many, many links for molecular biology.Internet problem: last update 1996
Many, many links. Highly scientificHUSAR
Many links to a wide variety of databases.
The Internet
Search engines
Biosci Newsgroups
Important entry sites
Link collections
Introduction
Comparison HUSAR / InternetHUSAR
EMBL / EBI
Keywords: molecularbiology
• Home of databasesEMBL, TREMBL andSwissprot
• Mitochondrial database• Ligand/Receptor
database• Home of European
Drosophila GenomeProject and Flybase
• Original home of SRS• Macromolecule structure• Large array of
downloadable software
HUSAR
EMBL / EBI
• Proteomics
• regular newsletter aboutthe EBI andBioinformatics
• http://industry.ebi.ac.uk/Datamining, EST´s,gene prediction, Java,microarrays, sequenceanalysis, visualisationand Web technology.
Database of databases
HUSAR
4
EBI´s Biocatalog
HUSAR
http://www.sanger.ac.uk
Keywords: large scale sequencing and analysisThis includes major databases and analysis software
Home of Pfam(Proteins Families Database of alignments and HMMs)
Home of AceDB(managing of genome project data)
and EMBOSS(The European Molecular Biology open software suite)
An abundance of tools, e.g.Victor Solovyev‘s gene prediction software
HUSAR
NCBI
• ENTREZ search andretrieval system
• Pubmed• home of BLAST• home of GENBANK• Unigene database• COGs (cluster of
genomic groups)• dbSNP Single
NucleotidePolymorphismsdatabase
• 600+ genome maps• Tools such as ORF
Finder and e-PCR
Keyword: major USsite for sequence analysis
HUSAR
NIH
Keywords: research, funding,USA science politics
• 25 separate institutes• huge amount of data
HUSAR
NIH• Local search engine• Still difficult to get to relevant data
HUSAR
The Institute for Genome Research
Genomes databases. E.g.:• >20 microbes• Parasites: Trypanosoma
brucei and Plasmodiumfalciparum
• Human,• Arabidopsis
Keyword: genome projects
http://www.tigr.org/tdb/
Abundant software, including
A system for finding genes in microbial DNA
MUMmer for aligning whole genome sequences
Sequence clean-up program HUSAR
5
Institut Pasteur: Bio Netbook
• The Bio Netbook is a search engine especially designed for biologists• Its index contains only biological expressions (2945)• Growing database• The homepage www.pasteur.fr contains a large amount of additional information HUSAR
GenomeNet
KEGG• Metabolic pathways• Regulatory pathways• Disease Catalogs, Cell Catalogs• Molecule Catalogs; compoundsand enzymes• Gene Catalogs• Genome Maps• Gene Expression Profiles• Computational Tools• Links to other pathway andcompound sites
Keywords: metabolic pathways / proteomics / metabolomics
HUSAR
GenomeNet / KEGG
Metabolic PathwaysGraphical pathway maps and ortholog grouptablesMaps are fully interactive
Regulatory Pathways
HUSAR
GenomeNet / KEGG
Gene Expression ProfilesStill preliminary characterClicakable signals allow identification of enzyme
HUSAR
More about proteomics
Gene Expression ProfilesStill preliminary characterhttp://bodymap.ims.u-tokyo.ac.jp
HUSAR
ExPASy
Many applications
High qualitysearch engine forbiologists
Largest collection of biologylinks on the WWW
(few outdated)
Keywords: Proteins /proteomics / applications
HUSAR
6
ExPASy
HUSAR
ExPASy
Software for 2D analysis
Swiss-PdbViewer is anapplication that providesa user friendly interface allowingto analyse several proteins at thesame time.
SWISS-MODEL, AnAutomatedComparative ProteinModelling Server
HUSAR
Pubcrawler
• It goes to the library. You go to the pub.• Automatic system which searches PubMed or other databases as oftenas you want with your keywords or sequences• Similar systems exist as well, links are indicated on the PubCrawlerhomepage
HUSAR
MIPS
• Keyword: Proteins and more...• Databases of proteins (Protfam), RNAs, mitochondrial sequences• Genome projects of human, yeast and Arabidopsis• Pathways, Proteomics• Yeast ORFs and genes• Small but comprehensive link list• An alert utility sends you once per week, via email, new database entries related to your field of study.• ORPHEUS is a software system for gene prediction in complete bacterial genomes and large genomic fragments.
HUSAR
IMB Jena
Keywords: biotech andmolecular biology
• Many useful links, up to date• Tools, databases, services
HUSAR
HUSAR
• Sequence Retrieval System• GDB• OMIM• AceDB• Genecards is mirrored at
the DKFZ• FAQ, Bioinformatics
information• Link list (>200 links)• Several free tools (e.g.
Genscan)
• ... and the HUSAR package
Keyword: Sequence analysis
HUSAR
7
The Internet
Search engines
Biosci Newsgroups
Important entry sites
Link collections
Introduction
Comparison HUSAR / Internet
HUSAR
Sequence analysis: Internet vs. HUSAR
The Internet HUSARNumber of applications many >250
Up-to-date few allComprehensive no yes
Databases many >90Speed low highSecurity low highData storage none 40MBHandling of multiple or large files bad (copy/paste) goodBatch job utilities mostly absent yesUser support low highDevelopment of customized tools no yesTraining low highBug removal slow fastCosts no low
HUSAR
Conclusion
You can always contact us at:
if you have difficulty locating the information that you need.
You can always contact us at:
if you have difficulty locating the information that you need.
genome @ dkfz-heidelberg.degenome @ dkfz-heidelberg.de
The internet contains abundant information, the importantthing is to use clever strategies to find it.
HUSAR