a practical introduction to bioinformatics
TRANSCRIPT
A practical introduction
to bioinformatics
Bioinformatics – A Practical Guide to the
Analysis of Genes and Proteins (2nd Edn)
Edited by Andreas D. Baxevanis and B.F. Francis Ouellette. John Wiley & Sons Inc.,2001. US$69.95 pbk (xviii + 470 pages) ISBN 0 471 38391 0
As the title
indicates, this book
is intended to
provide a practice-
oriented guide for
choosing and using
common biological
databases and
analysis tools.
It is written
primarily for
biologists, the usual
end-users of such systems, and therefore it
does not make too many assumptions
about prior knowledge of informatics.
People with a more theoretical background
in computer science or statistics will
probably be happier with a book such as
that by Mount [1], whereas software
engineers might prefer Developing
Bioinformatics Computer Skills [2].
The book contains chapters discussing
the data formats of several of the more
important databases as well as alignment,
searching, prediction, sequence assembly
and phylogenetic analysis tools. There is
clearly a focus on DNA rather than
protein-level databases and tools, even
though one could argue that the latter are
increasingly more important for most
applications. Also, the book does not
include computational biology topics such
as molecular simulations. There are two
well done chapters – one on the basics of
the Internet and the other a short tutorial
on the programming language Perl – but
they do not belong in a book on
bioinformatics.
Given its compact size of <500 pages,
the editors have tried to strike a
compromise between providing a
comprehensive overview and giving
detailed information. Each chapter begins
with some background information and
then goes on to present some databases or
tools. Generally one tool is presented for
each purpose, rather than providing a
comparison of tools with identical or
similar function and listing their
strengths and weaknesses. Also, the
selection focuses on databases and tools
that are of general interest rather than
specialized ones.
The issue of how to access or use
resources is typically handled by
providing a pointer to a web interface.
Programs with graphical user interfaces
are sometimes accompanied by
screenshots but there is little mention of
executing programs through the
command line. The focus here is on
reading and understanding the data
formats and analysis outputs rather than
discussing parameters required for
generating the data. Such information can
of course be found in the documentation
provided with the respective resources but
discussion of the aforementioned
parameters would nevertheless have
made the book a real treasure trove for
more advanced users.
Because the book covers a wide field,
chapters were contributed by more than
20 specialists. This ensures an overall
excellent quality of what is written.
Although books written in this manner
sometimes tend to overlap in certain
areas, there are surprisingly few
duplications among the chapters in the
book, a certain sign of good editing.
Regardless of its title the book should
be considered more of an introduction and
brief overview of the field of bioinformatics
rather than a comprehensive step by step
cookbook or a collection of tips by insiders
for more advanced users. I am not
suggesting that this book should have
been titled Bioinformatics for Dummies,
but given the expertise of the contributing
authors, it would have been possible to
take the book to a higher level.
Nevertheless this book is likely to find its
way into laboratories, and even specialists
will most probably read it, given the small
number of books published on
bioinformatics so far.
Eric Jain
Jain PharmaBiotech, Bläsiring 7, 4057 Basel,Switzerland.e-mail: [email protected]
References
1 Mount, D. (2001) Bioinformatics: Sequence and
Genome Analysis. Cold Spring Harbor
Laboratory Press
2 Gibas, C. and Jambeck, P. (2001) Developing
Bioinformatics Computer Skills. O’Reilly,
Sebastopol, CA, USA
Coming to grips with
microarray data analysis
Methods of Microarray Data Analysis
Edited by Simon Lin and Kimberly Johnson.Kluwer, 2001 £70 hbk (xiv + 189) ISBN 0792375645
This book is a
compilation of
presentations given
at the first annual
‘Critical Assessment
of Microarray Data
Analysis’meeting
(CAMDA 2000).
The editors of the
book organised the
conference, the
objective of which was to evaluate and
stimulate research and development (both
academic and commercial) in microarray
data analysis. At the first conference
presenters were asked to use one of two
published datasets [1,2]. The Golub data are
from leukemia patient samples hybridized
to Affymetrix gene chips and the Spellman
data are from yeast cultures hybridized to
spotted cDNA microarrays. Ten of the oral
presentations are included as (peer-
reviewed) papers in the edited volume.
Two additional review papers are included
as well as a brief glossary and index.
Additional information and links to related
software for many of the chapters as well
as the abstracts of all 30 presentations and
some slides can be found on the conference
website (http://camda.duke.edu).
The two review chapters cover machine
learning (focusing on decision trees and
artificial neural networks) and evolutionary
computation (focusing on genetic algorithms
and genetic programming). These are
informative but an overview of general
issues related to microarray data [3] is
missing. However, several chapters do
discuss normalization of array data.
Selection of differentially expressed genes
is also a common area of discussion. In
particular, Chapter 3 gives an interesting
discussion of multiple testing issues arising
from gene selection. Several chapters
cover the computational problems
associated with consideration of the joint
effects of thousands of variables but most
resort to initial univariate selection of a
subset of a few dozen. Most of the chapters
focused on discrimination (i.e., supervised
TRENDS in Biotechnology Vol.20 No.5 May 2002
http://tibtech.trends.com
226 Forum
classification) of the leukemia data.
Unfortunately this dataset is not very
challenging for classification because a few
genes can readily be identified that yield
accurate prediction. Nevertheless, many
chapters have nice discussions of important
issues in classification such as ‘boosting’
prediction accuracy by combining results
over several models and cross-validation.
Much attention is paid to machine learning
methods including Bayesian networks,
radial basis functions, neural trees and
support vector machines. There is limited
discussion of conventional statistical
approaches such as T-tests, logistic
regression and linear discriminant analysis.
There is also relatively little attention
paid to experimental design, unsupervised
classification (clustering) or visualization of
high dimensional data. Outside of references
to the machine learning literature, there are
few references to the wider statistical data
mining and pattern recognition literature
[4,5]. Given the rapid evolution of the field of
microarray data analysis, the references are
a bit out of date. But the Internet is a great
equalizer and a simple search on ‘microarray
data analysis’or ‘microarray normalization’
quickly brings many relevant discussions
and references. The book is not aimed at the
biologist wanting to learn the basic principles
of data analysis as applied to microarrays
(as is that by Knudsen, [6]), but rather at
more computational-oriented scientists
(statisticians and computer scientists alike)
who want to see how new and conventional
data mining methods can be applied to
microarray data. Given that the analyses
were not coordinated, the book does not
fully achieve its secondary objective of
providing a standardized platform for
rigorously appraising prediction methods.
A better example of this is the edited
volume from the STATLOG project [7].
References
1 Golub, T. et al. (1999) Molecular classification of
cancer: class discovery and class prediction by
gene expression monitoring. Science 286, 531–537
2 Spellman et al. (1998) Comprehensive
identification of cell cycle-regulated genes of the
yeast Saccharomyces cerevisiae by microarray
hybridization. Mol. Biol. Cell 9, 3273–3297
3 Quackenbush, J. (2001) Computational analysis
of microarray data. Nat. Rev. Genet. 2, 418–427
4 Hastie, T. et al. (2001) The Elements of Statistical
Learning: Data Mining, Inference, and Prediction.
Springer
5 Duda, R.O. et al. (2001) Pattern Classification.
(2nd Edn) Wiley-Interscience
6 Knudsen, S. (2002) A Biologist’s Guide to Analysis
of DNA Microarray Data. John Wiley & Sons
7 Michie, D. et al. (1994) Machine Learning, Neural
and Statistical Classification. Ellis Horwood
Kenneth R. Hess
Dept of Biostatistics, M.D. Anderson CancerCenter, 1515 Holcombe Blvd, Box 447,Houston, Texas 77030-4009, USA.e-mail: [email protected]
Using mass spectrometry
for drug discovery
Mass Spectrometry in Drug Discovery
edited by David T. Rossi and Michael W. Sinz.Marcel Dekker, 2002. $165.00 (hbk) (viii + 420 pages) ISBN 0 8247 0607 2
The fields of drug
discovery and
mass spectrometry
have become
increasingly
intertwined over
the past decade.
Although the
principal steps of
the drug discovery
process –
identification of
molecular targets, drug synthesis,
pharmacokinetics, toxicology and
clinical trials – remain largely
unchanged, the manner in which some
of these tasks are accomplished has
changed greatly. Currently, any
successful drug discovery program
incorporates elements from several
disparate areas of modern science, such
as combinatorial chemistry, natural
product screening, bioinformatics and
mass spectrometry. Mass spectrometry
has evolved from its restricted use in
physical and organic chemistry and is
now used routinely in biological and
pharmaceutical research programs.
Mass Spectrometry in Drug Discovery,
edited by Rossi and Sinz, is particularly
relevant to those who wish to learn more
about the strengths of applying mass
spectrometry to the science of drug
discovery. Indeed, the strongest point of
this book is that it covers the basic
principles of mass spectrometry,
including various ionization methods
and instrumentation, in an easily
understood format. The book is well
illustrated, with simple and informative
figures to explain difficult concepts. The
principles and applications of one of the
specialized techniques – liquid
chromatography-mass spectrometry
(LC-MS) – are explained in sufficient
detail. This is particularly relevant
because of the increasing use of LC-MS
as an automated high-throughput
technique for detection and quantitation
of biomolecules. Given the variety of
situations in which liquid
chromatographic separation is coupled
to mass spectrometry to provide a
readout, the editors have rightly devoted
an entire chapter to address sample
preparation and handling for LC-MS
experiments.
A major section of the book deals with
the applications of mass spectrometry,
covering principles and practices of
combinatorial chemistry, drug
metabolism, pharmacokinetics,
bioavailability and microdialysis. The
chapter on combinatorial chemistry is
excellent as it provides details on the
synthesis and characterization of various
types of combinatorial libraries.
However, an important area that does
not get adequate treatment is the use of
LC-MS to identify major compounds on
the basis of their ability to bind a target
protein. This can be an ultra high-
throughput technique, using mass-
encoded combinatorial libraries of
thousands of compounds [1], and it is
relevant because of the large number of
disease-associated target proteins known
today. As mass spectrometry is being
increasingly applied in drug discovery to
identify novel drug targets, a more
thorough discussion of this topic would
have broadened the appeal of this book.
However, this book is a good introduction
to the field of mass spectrometry,
particularly to those interested in drug
discovery. The broad coverage of the
application of mass spectrometry in the
pharmaceutical industry should also
make this book a valuable resource for
mass spectrometrists.
Shao-En Ong
Akhilesh Pandey*
Center for Experimental Bioinformatics,University of Southern Denmark,Campusvej 55, DK-5230 Odense M,Denmark.*e-mail: [email protected]
Reference:
1 Lenz, G.R. et al. (2000) Chemical ligands,
genomics and drug discovery. Drug Discov. Today
5, 145–156
Published online: 15 March 2002
TRENDS in Biotechnology Vol.20 No.5 May 2002
http://tibtech.trends.com
227Forum