commercial bio in for ma tics

Upload: deepeshkumarpal5194

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Commercial Bio in for Ma Tics

    1/20

    1

    BIOTECHNOLOGYREVIEW

    1 3 M A R C H 2 0 0 0

    J A S O N R E E D , P H D .( 2 1 2 ) 5 1 4 - 2 3 4 1

    J R E E D @ O S C A R G R U S S . C O M

    Trends in Commercial Bioinformatics

    Genome ofH. influenzae (TIGR)

    KEY POINTS:

    Roughly defined, bioinformatics technology is

    the backbone computational tools and databasesthat support genomic and related research.

    The spectacular rise of the commercial genomicsindustry and the broadening application ofgenomic techniques in biology and medicine hascreated a commercial market for bioinformaticssoftware, hardware and services.

    By some estimates, the total market forbioinformatics tools and services, includingcustom databases, could exceed $2.0 billionwithin five years.

    In our opinion, bioinformatics technology willbecome an increasingly important competitivedifferentiator for public and private life sciencecompanies going forward.

    Bioinformatics is becoming a directly investibletheme. By our estimation, there are now morethan 50 companies which offer bioinformaticsproducts and services of various kinds toexternal customers. Most of these are privatecompanies, but we would not be surprised to seea number of the more mature players go publicin the next 12 months.

  • 8/6/2019 Commercial Bio in for Ma Tics

    2/20

    Commercial Bioinformatics

    2

    COMMERCIAL BIOINFORMATICS?

    Introduction. The purpose of this document isto provide an overview of the rapidly emergingfield of commercial bioinformatics. We

    assume the reader has at least a baselineunderstanding of genomic technologies, andhow they are now being implemented incommercial drug discovery. For purposes ofthis review, we define bioinformatics as thebackbone computational tools and databases thatsupport genomic and related research, whichbroadly encompasses the study of DNAstructure/function, gene expression and proteinproduction/structure/function. The spectacularrise of the commercial genomics industry andthe broadening application of genomictechniques in biology and medicine has created acommercial market for bioinformatics software,hardware and services.

    A Flood of DNA Sequence Data. Theinitiation of large-scale genomic researchprojects roughly a decade ago engendered an

    intensive effort to create related informationmanagement and analysis tools, largely drivenby academic computer scientists associated withthe institutions involved. One of the first andmost important problems encountered was howto acquire, store and analyze massive amounts ofDNA sequence information. Reliable, high-throughput sequencing methods perfected in thepast few years are now churning out vastquantities of information --- from completegenomes of several bacteria and archaea(bacteria -like organisms that live in extremeconditions: a third kingdom of life) up to amostly complete sequence of humanchromosome 22, completed in late 1999.

    Partial List of Completely Sequenced Genomes

    Genome

    Size

    (MM base pairs) Est. Genes* Comp let ed Relevance

    ArchaeaAeropyrum pernix K1 1.67 2,694 1999 Potential source of novel enzymes, etc.Archaeoglobus fulgidus 2.18 2,407 1997 Potential source of novel enzymes, etc.Methanobacteriumthermoautotrophicum

    1.75 1,869 1997 Potential source of novel enzymes, etc.

    Pyrococcus abyssi 1.77 1,765 1999 Potential source of novel enzymes, etc.Pyrococcus horikoshii 1.74 2,064 1998 Potential source of novel enzymes, etc.

    BacteriaAquifex aeolicus 1.55 1,522 1997 Potential source of novel enzymes, etc.Bacillus subtilis 4.21 4,100 1997 Represents sporulating Gram-positive bacteriaCampylobacter jejuni 1.64 1,654 2000 Food-borne pathogenChlamydia trachomatis 1.04 894 1998 Human pathogenChlamydia pneumoniae 1.23 1,052 1998 Human pathogenEscherichia coli 4.64 4,289 1998 Key model organism; human pathogenHaemophilus influenzae 1.83 1,709 1995 Human pathogen; first free-living organism to

    have genome completely sequencedHelicobacter pylori 1.67 1,553 1997 Major cause of stomach ulcersHelicobacter pylori J99 1.64 1,491 1999 AnotherH. pylori strainMycobacterium tuberculosis 4.41 3,918 1998 Causes tuberculosisMycoplasma genitalium 0.58 480 1995 Genome is interesting because it is very smallMycoplasma pneumoniae 0.82 677 1996 Leading cause of walking pneumoniaRickettsia prowazekii 1.11 834 1998 Causes epidemic typhusSynechocystis PCC6803 3.57 3,169 1996 Should help us understand photosynthesis

    Treponema pallidum 1.14 1,031 1998 Causes venereal syphilisThermotoga maritima 1.86 1,846 1999 Potential source of novel enzymes, etc.Ureaplasma urealyticum 0.75 611 2000 Sexually transmitted pathogen

    EukaryotaCaenorhabditis elegans ~97.0 ~19,000 1998 Worm a key model organismSaccharomyces cerevisiae 12.07 5,885 1996 Yeast a key model organismHuman Chromosome 22** 33.46 600+ 1999 First human chromosome to be fully

    sequenced

    Source: NCBI; *excludes tRNA and rRNA genes; **euchromatic region

  • 8/6/2019 Commercial Bio in for Ma Tics

    3/20

    Commercial Bioinformatics

    3

    Growth in GenBank. GenBank, a major publicrepository of DNA sequence data, has grown toinclude roughly 4.86 million individualsequence records (representing about 3.86

    Growth of GenBank

    0

    1

    2

    3

    4

    5

    6

    1982 1984 1986 1988 1990 1992 1 99 4 1996 1998

    Sequences(millions)

    -

    500

    1,000

    1,500

    2,000

    2,500

    3,000

    3,500

    4,000

    BasePairsofDNA

    (millions)

    Sequences

    Base Pairs

    Source: NCBI

    billion base pairs), up from 0.56 million recordsin 1995 (0.38 billion base pairs). At the time ofthis writing, GenBank contained the full andpartial genome sequences of over 670 differentorganisms, including 27 complete genomes (6archaea, 19 bacteria and 2 eukaryotes).

    Organisms Represented in GenBank

    Source: NCBI

    This DNA sequence has been deposited inGenBank by a whole host of internationalacademic and government research groups, aswell as by commercial concerns. Almost allcompanies conducting genomic research, suchas Incyte, Human Genome Sciences, MillenniumPharmaceuticals, Myriad Genetics and GenomeTherapeutics, have sequenced stretches of

    human and other organisms DNA. Some of thisprivately-generated sequence data has beensubmitted to public databases like GenBank,while some remains proprietary.

    Human Genome Sequence. A publicconsortium now plans to produce a draft versionof the human genome sequence by mid-2000,and completely sequence the genome by 2003.This effort will be spearheaded by researchgroups at Washington University (St. Louis),Baylor College of Medicine, the WhiteheadInstitute and the Sanger Center in England. ThisHuman Genome Project consortium has greatlyspeeded timelines from its original genomecompletion date of 2005 due to: (1) thedevelopment of robust high throughput

    sequencing techniques, and (2) competition fromCelera, a division of PE Corp that intends tocomplete a lower fidelity, but still very useful,copy of the human genome by 2001 (with a draftversion expected out this year). In addition,both the public consortium and private concernsincluding Celera are sequencing all or parts ofthe genomes of model organisms, like themouse, hoping to gain additional insights intogenomic structure and function.

    Milestones in Human Genome (~3,000 mm bp) Sequencing

    Genetic Map CompletePhysical Map Complete

    High-Throughput Sequencing

    Technology

    Largely Perfected

    Chromosome 22 Sequence Finished Late 1999

    Celera Draft Sequence Expected Mid 2000

    Public Consortium Draft Sequence 90% by Mid 2000

    Celera Final Sequence Expected 2001

    Public Consortium Final Sequence Expected 2003

    Other:

    Human Sequence Variation Data In Progress

    Gene Identification In Progress

    Functional Analysis In Progress

    Sequence of Key Model Organisms:

    E.Coli (4.6 mm bp) Complete

    Yeast (12 mm bp) Complete

    C. Elegans (97 mm bp) Complete

    Drosophila (160 mm bp) Raw Sequence FinishedLate 1999 (Celera)

    Mouse (2,600 mm bp) Expected 2002 (Celera)Rice (400 mm bp) Expected 2001 (Celera)

    Source: NIH, NCBI, Celera

    485

    135

    54 10

    Viruses

    Eukaryota

    Bacteria

    Archaea

  • 8/6/2019 Commercial Bio in for Ma Tics

    4/20

    Commercial Bioinformatics

    4

    Data Generation is Accelerating. Datageneration is only accelerating at this timebecause: (1) many genomes besides human arebeing completely sequenced, and (2) high-throughput methods are being perfected in other

    areas like gene expression assays, protein-protein interaction assays, in vitro or cell-basedassays used in drug development and a host ofclinically related genetic tests.

    Data Source Drivers

    DNA Sequence High-throughput technique s:--shotgun sequencing

    --hybrid shotgun/map-based methods--automated capillary electrophoresis

    --genome maps of various kinds

    Lots of medically/biologically interestingorganisms

    Gene Expression Data Researchers have now found lots ofgenes:

    --cDNA sequencing, SAGE, etc.

    --genomic sequencing w/ gene IDtechniques

    Microarrays can assay thousands ofgenes at one time (10,000+)

    Very important for finding/validatingdrug targets

    Often involves model organisms

    Protein Data High-throughput techniques:

    --2-D gels

    --mass spectrometry

    --protein -protein interaction assays (yeast2 & 3 hybrid assays; also can be onchips)

    --various new in vitro and cell-basedassays

    Structure determination/predictionmethods becoming more powerful

    Very important for finding/validatingdrug targets

    Often involves model organisms

    Medical Genetics Data High-throughput techniques:

    --SNP/polymorphism chip-based assays

    --SNP/polymorphism mass spec assays--SNP/polymorphism electrophoresisassays

    Enables tailoring drugs to pat ients viagenetic profile (pharmacogenomics)

    Enables more efficient patient selectionfor clinical trials

    Enables disease predisposition testing

    Source: Oscar Gruss Research

    Bioinformatics is Becoming Critical to LifeScience R&D. If we take the massivegeneration of biological data as a starting place,bioinformatics technology enables the extractionof information that can be used in commercial

    drug discovery, clinical diagnostics, agriculturalbiotechnology and other applications.Currently, this includes three areas: (1) tools thatsupport laboratory experiments; (2) the design,implementation and integration of biologicaldatabases; and (3) various analytical tools todetermine via computer vs. experiment thingslike gene location within a chromosome, findingsimilar genes or proteins from other species anddetermining the 3-D structure and function ofdifferent proteins. These analyses can enable orgreatly accelerate drug target identification

    efforts, drug lead validation and optimization,pharmacogenomic studies and many otherbiotech applications.

    Bioinformatics Technology Involves:

    Design, Implementation and Integration of Biological Databases

    Aligning Protein and DNA Sequences

    Tools That Support Laboratory Experiments

    Assembling DNA Sequence Fragments and

    Creating Genomic Maps

    Recognizing and Annotating DNA Sequence Features

    Phylogenetic Comparisons

    Predicting RNA Secondary Structure

    Modeling Protein Structure and Dynamics

    These Techniques are Very Useful in R&D Related to:

    Commercial Drug Discovery

    Improving Clinical Trials

    Medical Diagnostics

    Pharmacogenomics Tailoring Medicines to Individuals

    Industrial Biotech

    Agbiotech

    Source: Oscar Gruss Research; Durbin, et al; Altman

    A Comment About the Current Limitations of

    Bioinformatics Technology: A comprehensiveassessment of the strengths and weaknesses ofthe different bioinformatics technologies isbeyond the scope of this report. Suffice it to saythat many of these methods are still very muchin development. However, these tools can bevery powerful when applied in the correctanalytical context, and in conjunction with theappropriate experimental validation. To further

  • 8/6/2019 Commercial Bio in for Ma Tics

    5/20

    Commercial Bioinformatics

    5

    clarify how these tools are used, we provide asimplified example below.

    BIOINFORMATICS TECHNOLOGY: AN

    EXAMPLE

    Sequence an Interesting Genomic Region.You might start by finding the DNA sequence ofa chromosomal region, speculating that itcontains genes from an interesting biologicalpathway. Your ultimate goal might be to find anundiscovered drug target. To rapidly assemble acontiguous DNA sequence that might have oneor more complete genes you might use theshotgun technique. This technique relies onpiecing together many small, electrophoreticallydetermined stretches of DNA sequence, each say

    500 base pairs in length, into a much largercontinuous stretch, say 2 million base pairs inlength. To do this in a (mostly) automatedfashion, you will need special programs likePHRED to read the raw DNA sequence, andPHRAP to assemble the small pieces into a largestretch of sequence. You will probably alsoneed to use a laboratory informationmanagement system (LIMS) to track yoursequencing project, as the process involvesmany individual samples and pieces of data thatneed to be stored and organized.

    Find Genes and Other Interesting Features inYour Genomic Sequence. You now have aDNA sequence that is a string of several millionsymbols (like ...AAGGCTGAGTGCTAAGCGCGCG), or a few strings of several hundredthousand symbols if you cannot put it alltogether (a common problem). You want to findregions that correspond to genes and perhapsregulatory sequences that control when thegenes are turned on and off. You might start byusing a program called BLAST (Basic Local

    Alignment Search Tool) to search the public orcommercial DNA sequence databases to see ifany stretches of your 2 million base pairsequence match previously identified genesequences. To do this faster you might usespecial computer hardware known as anaccelerator, such as the DeCypher systemfrom TimeLogic. You might use a moresophisticated software package like GENIE,GENSCAN or GRAIL to better identify where

    in your sequence the gene starts and stops, andwhere regulatory regions might be. These genefinding programs are not completely reliable inmost cases, but are useful when used inconjunction with other methods. In this regard,

    you would also want to search public and privateexpressed sequence tag (EST) databases. ESTsare short sequences (several hundred base pairs)experimentally determined to correspond to realgenes. If an EST matches part of your sequence,it is likely that that part contains a real gene.This is a very powerful technique, as you dontactually have to know what the gene does, andbecause available EST databases are now verycomprehensive.

    Compare Your Gene to Other Known Genes

    to Find More Information. As an extension ofthe process described above, you might want tokeep comparing regions of your sequence thatyou now think correspond to a gene with otherknown genes from human and with genes fromother organisms. If your gene is similar to ahuman gene of known function, say an enzymeof some kind, your gene might perform the samefunction and be structurally similar. To do this,you want to continue to use pairwisealignment algorithms (like BLAST and Smith-Waterman) to search both public and private

    databases. Comparing your gene to similargenes in other organisms (say genes from amouse and a fish, if they are known) can helpyou find important regulatory and functionalregions, among other things, because these tendto be evolutionarily conserved. This is onereason that the Human Genome ProjectConsortium and Celera are sequencing modelorganisms like mouse and fruit fly, in addition tohuman. To compare the sequences of severalgenes, you can use multiple alignmentalgorithms such as CLUSTAL W, or MSA.

    There are many other tools to compare the DNAand protein sequence of your new gene withother known genes and protein motifs.

    Displaying this Information is Critical toUnderstanding It. As you have collected all ofthis information about your DNA sequence,where the genes are, where the regulatorysequences might be, what corresponds to an ESTsequence, etc., you will need to display it

  • 8/6/2019 Commercial Bio in for Ma Tics

    6/20

    Commercial Bioinformatics

    6

    graphically. One of the best ways to do this is ina browser fashion that lets you easilyinvestigate each piece of information via mouseclick or something similar. A good display cantell you what information might be lacking and

    where the different sources of information agreeor disagree.

    Next Steps. In the example above, we mighthave been able to find out a great deal about thefunction, structure and pathway of action of ourgene via computer tools. This might tell us thatthe gene produces a protein that could beimportant in a disease process. Therefore wehave gone from information about achromosomal region to a potential drug target.

    This information can help us more efficiently

    design future experiments, or make someexperiments unnecessary.

    Going forward, we might want to usemicroarrays to investigate the expression of ourgene under different conditions (in response todifferent chemicals, etc.). There is now greatinterest in making databases of this type of geneexpression information, so you might not haveto conduct the experiment yourself. Examplesinclude the gene expression data availablecommercially from GeneLogic and Incyte, and a

    host of academic and government researchgroups now developing free gene expressiondatabases like Stanford University, theWhitehead Institute and the U.S. NationalCenter for Biotechnology Information (NCBI,part of the NIH).

    For the purposes of drug development, youddefinitely be interested in which other proteinsinteract with the protein from your gene. It mayturn out that another, structurally dissimilarprotein in the pathway would be a better drug

    target for some reason. (Myriads ProNettechnology, or CuraGens protein-proteininteraction databases are powerful,commercially available tools to find this type ofinformation.)

    Finally, you might attempt to model thestructure of your protein, and how it interactswith a drug molecule. This might tell you whichchemical class of molecules would be the most

    promising drug candidates. It should be notedthat modeling 3-D protein structures andprotein-small molecule interactions are some ofthe toughest problems in computational biology.Companies like Structural Bioinformatics and

    IBM are working on these kinds of problems, asare many other commercial and academicgroups.

    With this admittedly oversimplified example,weve hoped to demonstrate the following pointsabout bioinformatics technology:

    1. In one form or another, it is ubiquitousin genomics research.

    2. It can involve lots of database searching.The more high-quality informationavailable, the more powerfulbioinformatics can become.

    3. It can require many different algorithmsand analyses.

    4. Integrating and displaying theinformation is key.

    5. Bioinformatics wont replace

    experiments, but can greatlystreamline and enable the discovery

    process.

    An integrated system comprised of databases,analysis algorithms and display tools isdescribed in the schematic below.

    Data Viewer

    Administration Interface

    Data Analysis FunctionsInternal Control Functions

    Old Data New Data Public Data

    Data Viewer

    Administration Interface

    Data Analysis FunctionsInternal Control Functions

    Old Data New Data Public Data

  • 8/6/2019 Commercial Bio in for Ma Tics

    7/20

    Commercial Bioinformatics

    7

    Integrating these elements in an easily navigablesystem, whether it be a desktop program or anenterprise wide IT system, is highly desirable inmost commercial and in many large-scaleacademic research efforts.

    Commercial Applications of BioinformaticsAre Numerous. To understand the commercialrelevance of these technologies, one need onlyconsider that all of the public and private sectorgenomics research now being conducted reliesheavily on bioinformatics tools of the kinddescribed above. In the future, bioinformaticstools should see extensive use in all the key lifescience R&D markets, including thepharmaceutical and biotechnology industries,

    agricultural biotechnology, and in governmentand academic research. Penetration ofbioinformatics techniques into these marketsshould be driven by the following factors:

    1. The pressure to rapidly organize,integrate and mine data is enormous: itcosts a lot to produce, and competitiveand patent concerns are an issue.

    2. Maturation of tools should make themeasier to use.

    3. Life sciences R&D organizations are

    becoming more receptive to a paradigmshift in research techniques (i.e.genomics, R&D outsourcing, etc.), duein large measure to the insufficiency ofcurrent methods (product output perresearch $).

    But partly offset by:

    1. The fact that experienced bioinformaticspeople are relatively scarce.

    2. Lack of universal compatibilitystandards for tools and databases.

    3. Applications can be very complex andheterogeneous, thus the developmenttime/cost is often high.

    4. In some cases, the capital expenditure tosupport in-house capability is quitelarge, plus constant service expendituresof some type will probably be required.

    THE COMMERCIAL BIOINFORMATICS

    MARKET

    Market Structure

    By our estimation, there are now more than 50companies that offer bioinformatics productsand services of various kinds to externalcustomers. In surveying the industry, we findsurging volume growth, particularly amongindustry leaders, and an acceleration in thenumber of corporate deals and othercollaborations. We believe this reflects theexplosive growth in genomic and relatedresearch techniques, plus the weaknesses ofavailable analysis tools and databases. By someestimates, the total market for bioinformaticstools and services, including custom databases,could exceed $2.0 billion within five years.

    There remain a number of significantchallenges in this market, however. Over thepast few years, the customer base willing to paybig dollars for a customized bioinformaticssolution, large biopharma, has been relativelyconcentrated (perhaps fewer than 50 customers)and the largest players have mostly satisfiedtheir own needs with in-house bioinformaticsexpertise. Further, publicly available tools anddatabases are ubiquitous and becoming easier touse and are more integrated. Commercialsolutions that add substantial value tend to becomplex with longer development cycles thantraditional software products. On the otherhand, the individual applications can be veryheterogeneous, so it can be hard to leverage aspecific product across many applications. Thenet result is that development time/cost can behigh, but each individual market can berelatively narrow. Notably, the recentdissolution of the high-profile bioinformaticsstartup Molecular Applications Group (MAG)

    can be traced to these issues. Somebioinformatics companies have responded tothese hurdles by reorienting their businessmodels. For example, Pangea Systems recentlychanged from being an enterprise IT solutionprovider (a low-volume, high-price business) tobeing an e-bioinformatics portal (now calledDoubleTwist.com), which is targeted mostlytoward small- and mid-size customers (high-

  • 8/6/2019 Commercial Bio in for Ma Tics

    8/20

    Commercial Bioinformatics

    8

    volume, low-price). Compugen, whichoriginally produced special computer hardwarefor DNA and protein sequence analysis, nowoffers expanded services such as DNAmicroarray design and an e-bioinformatics portal

    (called LabOnWeb.com).

    Because bioinformatics is becoming such a

    critical enabling technology in modern

    biological research, we strongly feel thatcommercial solutions will ultimately reachtheir multi-billion dollar sales potential. It isan open question as to how the industry willrespond to the current problems of marketheterogeneity and small customer base. It is ourfeeling that consolidation, driven by the largerplayers, and cross-platform standardization

    will be major themes going forward.

    Below we outline the bioinformatics marketstructure and growth outlook in further detail:

    Product Categories

    There are several identifiable bioinformaticsproduct categories: proprietary databases ofvarious kinds, software and hardware analysistools of varying comprehensiveness, completeenterprise IT systems that manage and integratedatabases and analysis tools, and, finally, customservices. In time these distinctions shouldbecome blurred as tools, databases andinformation management systems become moreintegrated.

    We see the following technical hurdles asimportant to bioinformatics product design, andthe solutions which most effectively addressthem should have a competitive advantage:

    1. The data to be organized/analyzed is

    very heterogeneous.

    2. Analysis tools are rapidly evolving.

    3. Seamlessly integrating public, legacyand new data is a must.

    4. Many users are not software/computerexperts.

    Customer Base

    1. Pharmaceutical and biotechnologycompanies will use bioinformaticstechnology in all stages of the drugdiscovery process, from drug targetidentification through lead validation andoptimization to drug response profilingand clinical diagnostics.

    Key driver: This is the most importantcustomer base in terms of dollar value, dueto competitive and patent expiry pressuresand the fact that biopharma hastraditionally spent heavily on R&D. Largepharmaceutical companies are alreadyprodigious customers of outsourcedgenomics R&D that includes a lot of

    bioinformatics content. This includespartnerships like those betweenMillennium Pharmaceuticals andAstraZeneca, Bayer, Pfizer, and WyethAyest for example, or Human GenomeSciences deals with SmithKline Beecham,Schering-Plough, Merck KGaA, etc.There are many more examples. Webelieve that the middle market of smallerpharmas and mid- to small-size biotechs(perhaps 300+ companies, excludinggenomics companies) is relatively

    underpenetrated for a variety of reasons,including smaller R&D budgets and ahistorical emphasis on more traditionaldrug discovery technologies.

    Key constraint: As discussed above,leading pharma companies that have madea substantial commitment to genomicsresearch have already developed asubstantial bioinformatics infrastructure.This includes companies like SmithKline,Glaxo, Merck, Novartis and others. These

    types of customers are potentially thehighest value segment, but displacing a bigpharmas custom-tailored bioinformaticsgroup with an external product is onlypractical in the case of niche or especiallyhigh-value applications. Lion Biosciencesof Heidelberg, Germany, has been perhapsthe most successful bioinformaticscompany to date in penetrating big pharmawith a high-value infrastructure deal. In

  • 8/6/2019 Commercial Bio in for Ma Tics

    9/20

    Commercial Bioinformatics

    9

    1999, Lion entered into a five-year alliancewith Bayer AG worth up to $100 million,in which Lion will provide and supportbioinformatics IT systems to speedBayers drug discovery programs. The

    deal included the establishment of LionsU.S. subsidiary Lion Bioscience Researchin Cambridge, MA.

    2. Agbiotech/Industrial Biotech companieshave already started to use genomicsresearch methods extensively in the studyof crops and livestock, with the hope ofimproving crop/livestock yields, increasingpesticide/herbicide resistance, improvingtaste/nutritional content, etc.

    Key driver: We expect that the wideninguse of gene expression assays andproteomics assays of various kinds inag/industrial biotech will sharply increasethe need for bioinformatics technology inthis market. The increased pace of wholegenome sequencing of thermophilicorganisms and other extremeophiles(like that of M. thermoautotrophicusbyGenome Therapeutics), which mayprovide a novel source of enzymes forindustrial processes, should support this

    trend.

    Key constraint: This market segment hastraditionally been slower than biopharmato embrace genomic techniques. Webelieve that the current negative publicperception of genetically modifiedorganisms (GMOs) will remain a factor, atleast in the near future.

    3. Academic research groups , particularlythose associated with the international

    effort to sequence the human genome,have pioneered most of the genomic andbioinformatics techniques in use today andshould continue to be heavy users.

    Key driver: Cutting edge research intogene expression, proteomics and medicalgenetics will increasingly rely on the useof bioinformatics tools, in our opinion.

    Key constraint: Outside of the large,government coordinated projects like theHuman Genome Initiative, individualresearchers tend to be less intensive datagenerators/users than commercial

    concerns. As a result, their bioinformaticsneeds are often satisfied by a combinationof publicly available tools, commercialdesktop solutions (like those availablefrom InforMax or GCG) and home grownsystems.

    4. Other markets include governmentagencies like the U.S. Patent andTrademark Office, which recentlypurchased a Compugen DNA and proteinsequence analysis computer system to aid

    in patent searches. We expect lawenforcement agencies like the FBI and thearmed services to compile and makeincreasing use of genetic profile databasesin the future. However, in the near term,these non-commercial markets willprobably remain small in terms of totaldollar value.

    Participants in the Field

    1. Academic and government groups which

    produce publicly available tools anddatabases, some of which are quitecomprehensive and sophisticated.Examples are the many tools and databasesmaintained by the NCBI, includingGenBank. Appendix B at the end of thisreport contains a partial list of availablebiological databases, many of which arepublic free-access databases. Below is aschematic of NCBIs Entrez databasebrowser system:

    Source: NCBI

  • 8/6/2019 Commercial Bio in for Ma Tics

    10/20

    Commercial Bioinformatics

    10

    2. Genomic and pharmacogenomiccompanies that offer databases andservices to outside customers, as well asfor their own internal use. This includescompanies like Incyte, Celera, CuraGen

    and GeneLogic. We would also includebiotech instrumentation companies like PEBiosystems in this category.Instrumentation products usually includedata management and analysis tools ofvarying utility.

    3. Large pharma, biotech and agbiocompanies which develop their own in-house databases and bioinformaticsexpertise. As discussed above, some ofthe largest pharmaceutical companies have

    well-developed bioinformaticsinfrastructures, and thus are difficult foroutside providers to penetrate. Thesituation is much more favorable in mid-size to smaller companies, however thesefirms often cannot justify extremely largeexpenditures on infrastructure unless itaddresses a core research focus.

    4. Traditional computer, electronictechnology and IT services companiesthat offer products and services for the

    bioinformatics market. This includescompanies like Compaq, SunMicrosystems, Silicon Graphics, IBM andAgilent Technologies. For the most part,these companies have taken thecomplementary approach of providinginfrastucture that supports varioussolutions by specialized bioinformaticsproviders. We expect these companies to

    be an increasingly important

    competitive force in genomics andbioinformatics. For instance, Compaq

    has a major strategic alliance with Celerato provide integrated bioinformaticshardware, software, networking andservice solutions. IBM is conductingresearch into high value data mining andprotein structure determination methods.IBM offers a variety of enterprise-wide ITsolutions for the life science market, andrecently initiated a collaboration withNetGenics. Through its partnership with

    Rosetta Inpharmatics, Agilent offers anenterprise-wide gene expression analysissolution that includes software andhardware and is a rival to AffymetrixsGeneChip system.

    5. More or less pure play bioinformaticscompanies that offer products and servicesto external customers. Some of thesecompanies are trying to leverage theirbioinformatics expertise toward in-houseefforts like drug discovery, and are thussomewhat like traditional genomicscompanies (see category #2 in this list).Most of these are private companies, butwe would not be surprised to see a numberof the more mature players go public in the

    next 12 months. Some, but by no meansall, of the prominent companies in thisspace are listed in Appendix A of thisreport, and the market outlook for thissegment is discussed in more detail below.

    More on Market Size and Growth Outlook

    Given the nascent nature of this industry and thelarge number of private players in the field, thecurrent market for external products and servicesis hard to determine. Surveys of the 50 or so

    bioinformatics tool and database companies bymarket research groups like Frontline and Frost& Sullivan, for example, put the current marketfor bioinformatics databases, products andservices at roughly $300 million, with about halfof the annual sales by data suppliers and half ofthe sales by tool/IT providers of various kinds.These groups and other industry observersbelieve that this market could grow to $1.5-2.0billion over the next five years. These estimatesexclude some significant internal spending on ITinfrastructure by pharmaceutical and

    biotechnology companies that is bioinformaticsrelated, and could be as large as $2.0+ billionannually. As discussed above, also excluded aremost of the project-based R&D collaborationsbetween pharma/agbio companies and genomicscompanies that include bioinformatics content,and which total well over $1.0 billion on acumulative basis over the past 3-5 years.

  • 8/6/2019 Commercial Bio in for Ma Tics

    11/20

    Commercial Bioinformatics

    11

    Without more publicly disclosed financials thesemarket size estimates are hard to pin down.However, we find them reasonable if notconservative, in that they imply visible 25%-35% top-line growth over the next few years,

    which is consistent with our own survey of keyindustry players. Conceptually, as discussedabove, we believe bioinformatics will becomeessential to many if not all life science R&Dactivities, and the market for commercialsolutions of various kinds should increase inproportion.

    Most of todays sales come from the databaseproviders and the software/hardware toolsuppliers, with complete enterprise IT solution

    just emerging (perhaps 20% or less of thecurrent market). Over the next few years, theenterprise IT solution should garner a largerproportion of the industrys total sales, driven bya great need for integration of the various

    databases/tools with R&D efforts. Also, weexpect growth in the sales of data providers to besupported by the emergence of new types ofdata, namely gene expression and proteomicsdata. However, commercial database sales arelikely to be constrained by the increasing publicavailability of well-annotated genome sequencefrom human and other organisms, and by theincreasing public availability of other types ofdata.

  • 8/6/2019 Commercial Bio in for Ma Tics

    12/20

    Commercial Bioinformatics

    12

    Appendix A -- Representative Bioinformatics Database, Software, Hardware and Service Providers

    Concentrated Bioinformatics Plays Ticker Description

    Compugen Private Originally specialized in computer hardware/software designed to

    accelerate bioinformatics algorithms. Business model nowmoving more toward an internet portal concept, plus proprietaryand collaborative gene discovery.

    DoubleTwist.com Private An internet portal business model, which includes on-line accessto a variety of bioinformatics/biotech tools, databases and otherproducts. DoubleTwist changed its name from Pangea Systems in

    1999.

    eBioinformatics Private Originally a spin-off from the Australian National GenomicInformation Service. eBioinformatics provides a variety of web-based bioinformatics tools and databases.

    Genomica Private Provides enterprise-wide bioinformatics systems and services.

    Relationships include AstraZeneca, Glaxo Wellcome, Parke Davisand PE Biosystems.

    Informax Private Desktop and enterprise-wide bioinformatics products. Customerbase of over 60 pharma companies, 250 biotechs and 500

    universities.

    Lion Bioscience Private Provides enterprise-wide bioinformatics systems and services.Lion has interest in leveraging technology for proprietary R&D.Lions $100 MM alliance with Bayer AG largest bioinformatics

    deal to date.

    Molecular Mining Private Molecular Mining produces high value-added data miningalgorithms than can be used to filter gene expression and othertypes of data.

    Neomorphic Private Bioinformatics tools to mine and visualize genomic information.

    Collaborations with key academic and commercial genomictechnology leaders.

    Netgenics Private Provides enterprise-wide bioinformatics systems and services.Relationships include Pfizer, Abbott, Wyeth Ayerst and IBM.

    Oxford Molecular OMG.LN Comprehensive business model that includes bioinformatics and

    related fields of cheminformatics and computational chemistry. In1997, acquired Genetics Computer Group, maker of the popularWisconsin desktop bioinformatics product.

    Paracel Private Specialized computer hardware/software designed to accelerate

    bioinformatics algorithms. Relationships with many academic andcommercial research groups, including PE Corp.

    Silicon Genetics Private Tools for gene expression analysis and visualization, plus otherdata-mining applications.

    SpotFire Private SpotFire offers data visualization software for gene expression aswell as products for non-life sciences industries.

    Structural Bioinformatics Private Bioinformatics tools and databases with a special focus on proteinstructural information, a critical component of rational drug design

    TimeLogic Private Specialized computer hardware/software designed to accelerate

    bioinformatics algorithms. Configurable hardware architectureoffers competitive advantage in some cases. Relationships withkey academic and commercial research groups, including Stanford

    University, Roche, Bristol-Myers and Novartis.

  • 8/6/2019 Commercial Bio in for Ma Tics

    13/20

    Commercial Bioinformatics

    13

    Genomic/Biotechnology Companies

    with Bioinformatics ProductsTicker Description

    Celera CRA A division of PE Corp founded to rapidly sequence the human andother genomes, with the intent to supply high value-added genomicdata to life science collaborators. Celera has the worlds mostpowerful high-throughput DNA sequencing capability .

    CuraGen CRGN CuraGen conducts project driven genomic R&D for propriety use andin collaboration with life science partners. CuraGen offerscollaborators a variety of well-integrated databases, bioinformaticstools and services.

    GeneLogic GLGC Offers GeneExpress gene expression database products, and otherservices to the life sciences industry.

    Human Genome Sciences HGSI HGSI practically founded the commercial genomics industry with itslandmark 1993 gene database deal with SmithKline Beecham. HGSInow has collaborations with more than ten commercial partners inareas including gene databases, antibodies, gene therapy andmicrobial genomics.

    Incyte INCY A pioneer commercial bioinformatics database company. Provideshigh-value gene expression, proteomics and other data/analysis toolsto pharmaceutical and academic subscribers.

    Myriad Genetics MYGN Myriads core competence is therapeutic and diagnostic productdevelopment via genomic and proteomic methods. Myriad offers apublic version of its high-quality protein interaction database, ProNet,through DoubleTwist.com and t hrough its own Myriad-ProNet.comwebsite.

    PE Biosystems PEB A division of PE Corp, the premier provider of DNA sequencers andother life science instrumentation. The PE Informatics division offersa variety of software products to life science and other customers.

    Rosetta Inpharmatics Private Rosettas core competence is obtaining gene expression and otherdata in a setting relevant to drug/product discovery for proprietary use

    and in collaboration with life science partners. Through itscommercialization partner Agilent, Rosetta offers an enterprise-widegene expression analysis solution that includes software andhardware.

    Computer, Electronic Technologyand IT Services Companies Offering

    Bioinformatics Products

    Ticker Description

    Agilent Technologies A In 1999, Agilent entered into a strategic collaboration with RosettaInpharmatics to make and sell gene expression analysis systems,including hardware and software.

    Compaq CPQ Compaq has a major strategic alliance with Celera to provideintegrated bioinformatics hardware, software, networking and servicesolutions.

    IBM IBM IBM is conducting research into high value-added data mining andprotein structure determination methods. IBM offers a variety ofenterprise-wide IT solutions for the life science market, and recentlyinitiated a collaboration with NetGenics.

    Silicon Graphics SGI SGI offers visual computing and high-performance computer systems.SGI systems support a wide variety of bioinformatics softwareapplications.

    Sun Microsystems SUNW Sun systems support a wide variety of bioinformatics softwareapplications.

  • 8/6/2019 Commercial Bio in for Ma Tics

    14/20

    Commercial Bioinformatics

    14

    Appendix B --Representative Molecular Biology Databases

    (from A. Baxevanis inNucleic Acids Research , 2000, V.28, No.1)

    Major Sequence Repositories

    GenBank All known nucleotide and protein sequences; International NucleotideSequence Database Collaboration

    EMBL Nucleotide Sequence Database All known nucleotide and protein sequences; International Nucleotide

    Sequence Database CollaborationDNA Data Bank of Japan (DDBJ) All known nucleotide and protein sequences; International Nucleotide

    Sequence Database Collaboration

    Genome Sequence Database (GSDB) All known nucleotide and protein sequencesTIGR Gene Indices Non-redundant, gene-oriented clustersUniGene Non-redundant, gene-oriented clusters

    Comparative Genomics

    Clusters of Orthologous Groups (COG) Phylogenetic classification of proteins from 21 complete genomesXREFdb Cross-referencing of model organism genetics with mammalian phenotypes

    Gene ExpressionASDB Protein products and expression patterns of alternatively-spliced genesAxeldb Gene expression in Xenopus

    BodyMap Human and mouse gene expression dataEpoDB Genes expressed in vertebrate RBCFlyView Drosophila development and genetics

    Gene Expression Database (GXD) Mouse gene expression and genomicsKidney Development Database Kidney development and gene expressionMAGEST Ascidian (Halocynthia roretzi) gene expression patterns

    Mouse Atlas and Gene ExpressionDatabase

    Spatially-mapped gene expression data

    PEDB Normal and aberrant prostate gene expression

    Tooth Development Database Gene expression in dental tissue

    TRIPLES TRansposon-Insertion Phenotypes, Localization and Expression inSaccharomyces

    Gene Identification and Structure

    Ares Lab Intron Site Yeast spliceosomal intronsCOMPEL Composite regulatory elements

    CUTG Codon usage tablesEID Protein-coding, intron-containing genesEPD Eukaryotic POL II promoters

    ExInt Exon-intron structure of eukaryotic genesIDB/IEDB Intron sequence and evolutionPLACE Plant cis -acting regulatory elements

    PlantCARE Plant cis -acting regulatory elements

    TransTerm Codon usage, start and stop signalsTRRD Regulatory regions of eukaryotic genes

    YIDB Yeast nuclear and mitochondrial intron sequences

    Genetic Maps

    GeneMap '99 International Radiation Mapping Consortium human gene map

    G3-RH Stanford G3 and TNG radiation hybrid mapsGB4-RH Genebridge4 (GB4) human radiation hybrid mapsGDB Human genes and genomic maps

    DRESH Human cDNA clones homologous to Drosophila mutant genesGenAtlas Human genes, markers and phenotypes

  • 8/6/2019 Commercial Bio in for Ma Tics

    15/20

    Commercial Bioinformatics

    15

    HuGeMap Human genome genetic and physical map dataIXDB Physical maps of human chromosome X

    Radiation Hybrid Database Radiation hybrid map data

    Genomic Databases

    AceDB Caenorhabditis elegans, Schizosaccharomyces pombe and human sequences

    and genomic informationFlyBase Drosophila sequences and genomic informationMouse Genome Database (MGD) Mouse genetics and genomics

    Saccharomy ces Genome Database (SGD) Saccharomyces cerevisiae genomeAmmtDB Metazoan mitochondrial DNA sequencesArabidopsis Database (AtDB) Arabidopsis thaliana genome

    CropNet Genome mapping in crop plantsCyanoBase Synechocystis sp. genomeEcoGene Escherichia coli K-12 sequences

    EMGlib Completely sequenced bacterial genomes and the yeast genomeGOBASE Organelle genome databaseHIV Sequence Database HIV RNA sequences

    Human BAC Ends Database Non-redundant human BAC end sequencesINE Rice genetic and physical maps and sequence dataMendel Database Database of plant EST and STS sequences annotated with gene family

    informationMitBASE Mitochondrial genomes, intra-species variants, and mutantsMitoDat Mitochondrial proteins (predominantly human)

    MITOMAP Human mitochondrial genomeMITONUC/MITOALN Nuclear genes coding for mitochondrial proteinsMITOP Mitochondrial proteins, genes and diseases

    Munich Information Center for ProteinSequences (MIPS)

    Protein and genomic sequences

    NRSub Bacillus subtilis genome

    Phytophthora Genome Initiative Database Oomycete sequences and genetic maps

    RsGDB Rhodobacter sphaeroides genomeTIGR Microbial Database Microbial genomes and chromosomes

    ZFIN Zebrafish genetics and development; mutant and wild-type linesZmDB Maize genome database

    Intermolecular Interactions

    Database of Ribosomal Crosslinks (DRC) Ribosomal crosslinking dataDIP Catalog of protein-protein interactionsDPInteract Binding sites for Escherichia coli DNA-binding proteins

    Metabolic Pathways and Cellular Regulation

    Kyoto Encyclopedia of Genes andGenomes (KEGG)

    Metabolic and regulatory pathways

    EcoCyc Escherichia coli K-12 genome, gene products and metabolic pathwaysENZYME Enzyme nomenclatureEpoDB Genes expressed during human erythropoiesis

    FlyNets Drosophila melanogaster molecular interactionsKlotho Collection and categorization of biological compoundsLIGAND Enzymatic ligands, substrates and reactions

    RegulonDB Escherichia coli pathways and regulationUM-BBD Microbial biocatalytic reactions and biodegradation pathways primarily for

    xenobiotic, chemical compounds

    WIT2 Integrated system for functional curation and development of metabolicmodels

  • 8/6/2019 Commercial Bio in for Ma Tics

    16/20

    Commercial Bioinformatics

    16

    Mutation Databases

    Online Mendelian Inheritance in Man

    (OMIM)

    Catalog of human genetic and genomic disorders

    ALFRED Allele frequencies and DNA polymorphismsAndrogen Receptor Gene Mutations

    Database

    Mutations in the androgen receptor gene

    Asthma and Allergy Database Genetics of allergy and asthma, including linkage studies and mutation dataAsthma Gene Database Linkage and mutation studies on the genetics of asthma and allergy

    Atlas of Genetics and Cytogenetics inOncology and Hematology

    Chromosomal abnormalities in cancer

    BTKbase Mutation registry for X-linked agammaglobulinemia

    Cytokine Gene Polymorphism Database Cytokine gene polymorphisms, in vitro expression and disease-associationstudies

    Database of Germline p53 Mutations Mutations in human tumor and cell line p53 gene

    DbSNP Single nucleotidepolymorphismsGRAP Mutant Databases Mutants of family A G-Protein Coupled Receptors (GRAP)Haemophilia B Mutation Database Point mutations, short additions and deletions in the Factor IX gene

    HAMSTeRS Hemophilia A mutation database

    HGBASE Intragenic sequence polymorphismsHIV-RT HIV reverse transcriptase and protease sequence variation

    Human Gene Mutation Database (HMGD) Known (published) gene lesions responsible for human inherited diseaseHuman PAX2 Allelic Variant Database Mutatio ns in human PAX2 geneHuman PAX6 Allelic Variant Database Mutations in human PAX6 gene

    Human Type I and Type III CollagenMutation Database

    Human type I and type III collagen gene mutations

    HvrBase Primate mtDNA control region sequences

    iARC p53 Database Missense mutations and small deletions in human p53 reported in peer-reviewed literature.

    KinMutBase Disease-causing protein kinase mutations

    KMDB Mutations in human eye disease genesMmtDB Mutations and polymorphisms in metazoan mitochondrial DNA sequences

    Mutation Spectra Database Mutations in viral, bacterial, yeast and mammalian genesNCL Mutations Mutations and polymorphisms in neuronal ceroid lipofuscinoses (NCL) genesp53 Databases Human p53 and hprt mutations; transgenic lacZ and transgenic/bacterial lacI

    mutations

    PAHdb Mutations at the phenylalanine hydroxylase locusPMD Compilation of protein mutant dataRB1 Gene Mutation Database Mutations in the human retinoblastoma (RB1) gene

    Ribosomal RNA Mutational Database 16S and 23S ribosomal RNA mutation databaseSV40 Large T-Antigen Mutant Database Mutations in SV40 large tumor antigen gene

    Pathology

    FIMM Functional molecular immunology data (diseases, antigens, peptides andHLA binding sites

    Mouse Tumor Biology Database (MTB) Mouse tumor names, classification, incidence, pathology, genetic factorsPEDB Sequences from prostate tissue and cell type-specific cDNA libraries

    Protein Databases

    AARSDB Aminoacyl-tRNA synthetase sequences

    DatA Annotated coding sequences from ArabidopsisDExH/D Family Database DEAD-box, DEAH-box and DExH-box proteinsEndogenous GPCR List G protein-coupled receptors; expression in cell lines

    ESTHER Esterases and [alpha]/[beta] hydrolase enzymes and relativesFUNPEP Low-complexity or compositionally-biased protein sequencesGenProtEC Escherichia coli genes, gene products and homologs

  • 8/6/2019 Commercial Bio in for Ma Tics

    17/20

    Commercial Bioinformatics

    17

    GPCRDB G protein-coupled receptorsHistone Sequence Database Histone and histone-fold sequences and structures

    HIV Molecular Immunology Database HIV epitopesHomeobox Page Information relevant to homeobox proteins, classification and evolutionHomeodomain Resource Homeodomain sequences, structures, and related genetic and genomic

    informationHUGE Large (>50 kDa) human proteins and cDNA sequencesIMGT Immunoglobulin, T cell receptor and MHC sequences

    InBase Intervening protein sequences (inteins) and motifsKabat Database Sequences of proteins of immunological interestLGIC Ligand-gated ion channel sequences, alignments and phylogeny

    Membrane Protein Database Membrane protein sequences, transmembrane regions and structuresMEROPS Peptidase sequences and structuresMHCPEP MHC-binding peptides

    NRR Steroid and thyroid hormone receptor superfamilyOlfactory Receptor Database Sequences for olfactory receptor-like moleculesOoTFD Transcription factors and gene expression

    Peptaibol Peptaibol (antibiotic peptide) sequences

    PhosphoBase Protein phosphorylation sitesPKR Protein kinase sequences, enzymology, genetics, and molecular and structural

    propertiesPPMdb Arabidopsis plasma membrane protein sequence and expression dataProlysis Proteases and natural and synthetic protease inhibitors

    PROMISE Prosthetic centers and metal ions in protein active sitesProtein Information Resource (PIR) Non-redundant protein sequence databaseReceptor Database (RDP) Receptor protein sequences

    Ribonuclease P Database RNase P sequences, alignments and structuresSENTRA Sensory signal transduction proteinsSWISS-PROT/TrEMBL Curated protein sequences

    TRANSFAC Transcription factors and binding sitesWnt Database Wnt proteins and phenotypes

    Protein Sequence Motifs

    BLOCKS Protein sequence motifs and alignmentsPROSITE Biologically-significant protein patterns and profilesPfam Multiple sequence alignments and hidden Markov models of common protein

    domainsO-GLYCBASE Glycoproteins and O-linked glycosylation sitesPIR-ALN Protein sequence alignments

    PRINTS Protein squence motifs and signaturesProClass Families defined by PROSITE patterns and PIR superfamiliesProDom Protein domain families

    ProtoMap Automated hierarchical classification of SWISS-PROT proteinsSBASE Annotated protein domain sequences

    SMART Signalling domain sequencesSYSTERS Protein clusters

    Proteome Resources

    Aaindex Physicochemical properties of peptides

    REBASE Restriction enzymes and associated methylasesSWISS-2DPAGE 2D-PAGE images and reference mapsYeast Proteome Database (YPD) Saccharomyces cerevisiae proteome

  • 8/6/2019 Commercial Bio in for Ma Tics

    18/20

    Commercial Bioinformatics

    18

    Retrieval Systems and Database Structure

    KEYnet Keywords extracted from EMBL and GenBank

    Virgil Database interconnectivity

    RNA Sequences

    5S Ribosomal RNA Databank 5S rRNA sequences

    ACTIVITY Functional DNA/RNA site sequencesCollection of mRNA-like non-codingRNAs

    Non-protein-coding RNA transcripts

    Database on the Structure of Large SubunitRibosomal RNA

    Alignment of large subunit ribosomal RNA sequences

    Database on the Structure of Small Subunit

    Ribosomal RNA

    Alignment of small subunit ribosomal RNA sequences

    Guide RNA Database Guide RNA sequencesIntronerator RNA splicing and gene structure in Caenorhabditis elegans

    Non-canonical Base Pair Database RNA structures containing rare base pairsPLMItRNA Plant mitochondrial tRNAs and tRNA genesPseudobase Information on RNA pseudoknots

    Ribosomal Database Project (RDP) rRNA sequences, alignments, and phylogeniesRNA Modification Database Naturally modified nucleosides in RNASELEX_DB Selected DNA/RNA functional site sequences

    Small RNA Database Direct sequencing of small RNA sequencesSRPDB Signal recognition particle RNA, protein, and receptor sequencesTmRDB tmRNA (10Sa RNA) sequences

    tmRNA Website tmRNA (10Sa RNA) sequencestRNA Sequences tRNA and tRNA gene sequencesUTRdb 5' and 3' UTRs of eukaryotic mRNAs

    Viroid and Viroid-Like RNA Database Viroid and viroid-like RNA and vHDV sequencesYeast snoRNA Database Yeast small nucleolar RNAs

    Structure

    PDB Structure data determined by X-ray crystallography and NMRCATH Hierarchical classification of protein domain structuresSCOP Familial and structural protein relationships

    ASTRAL Analysis of protein structures and their sequencesBioImage Searchable database of multi-dimensional biological imagesBioMagResBank NMR spectroscopic data from proteins, peptides and nucleic acids

    CSD Crystal structure information for organic and metal organic compounds.Database of Macromolecular Movements Descriptions of protein and macromolecular motions, including moviesDecoys 'R' Us Computer-generated protein conformations based on sequence data

    HIC-Up Structures of small molecules ('hetero-compounds')HSSP Structural families and alignments; structurally-conserved regions and

    domain architecture

    IMB Jena Image Library Visualization and analysis of three-dimensional biopolymer structures

    ISSD Integrated sequence and structural informationLPFC Library of protein family core structures

    MMDB All three-dimensional structures, linked to NCBI Entrez systemMODBASE Comparative protein structure modelsNDB Nucleic acid-containing structures

    PDB-REPRDB Representative protein chains, based on PDB entriesPRESAGE Protein structures with experimental and predictive annotationsProtein Motions Database Motions of protein loops, domains and subunits

    ProTherm Thermodynamic data for wild-type and mutant proteinsRESID Protein structure modifications

  • 8/6/2019 Commercial Bio in for Ma Tics

    19/20

    Commercial Bioinformatics

    19

    Transgenics

    Cre Transgenic Database Cre transgenic mouse lines

    Transgenic/Targeted Mutation Database Information on transgenic animals and targeted mutations

    Varied Biomedical Content

    CarbBank Complex carbohydrate/polysaccharide sequences

    Dbcat Catalog of databasesDrugDB Pharmacologically-active compounds; generic and trade namesHOX-PRO Clustering of homeobox genes

    LocusLink/RefSeq Curated sequence and descriptive information about genetic lociMolecular Probe Database Synthetic oligonucleotides, probes and PCR primersMPDB Information on synthetic oligonucleotides

    NCBI Taxonomy Browser Names of all organisms that are represented in the genetic databases with atleast one nucleotide or protein sequence

    PubMed MEDLINE and Pre-MEDLINE citations

    Tree of Life Information on phylogeny and biodiversityVectordb Characterization and classification of nucleic acid vectors

  • 8/6/2019 Commercial Bio in for Ma Tics

    20/20

    INVESTMENT RESEARCH

    HEALTHCARE TECHNOLOGY

    Biotechnology Telecommunications Equipment

    Akhtar Samad, M.D., Ph.D. (212) 514-2342 Ayelet Oron (212) 514-2305

    Jason Reed, Ph.D. (212) 514-2341 Gilad Alper (212) 514-2356

    Alan J. Tuchman, M.D. (212) 514-2345

    John Tonkin (212) 514-2348 Diversified Technology

    Rami Rosen (972) 3519-9004

    Medical Technology

    Alan J. Tuchman, M.D. (212) 514-2345 SPECIAL SITUATIONS

    Alan J. Septimus (212) 514-2317

    Peter H. Vogel (212) 514-2336

    ASIA PACIFIC

    Telecommunications and Special Situations

    Sandia Shih (212) 514-2358

    This report is based upon information which Oscar Gruss & Son Incorporated believes to be reliable but no

    representation is made by this Firm or any of its affiliates as to its completeness or accuracy. This report is not acomplete analysis of every material fact concerning any company, industry or security, and more information isavailable upon request. Opinions expressed herein are subject to change without notice. Oscar Gruss & SonIncorporated makes a market in this security and may have a long or short position in this security in connection withthis activity. This Firm and/or our employees and affiliates may own or have positions in any securities of companies

    mentioned in this study, which positions may change at any time, and may, from time to time, sell or buy suchsecurities. This Firm or one of its affiliates may from time to time perform investment banking or other services for, orsolicit investment banking or other business from a company mentioned in this report.

    2000 Oscar Gruss & Son Incorporated. All rights reserved.