a practical introduction to bioinformatics

A practical introduction

to bioinformatics

Bioinformatics – A Practical Guide to the

Analysis of Genes and Proteins (2nd Edn)

Edited by Andreas D. Baxevanis and B.F. Francis Ouellette. John Wiley & Sons Inc.,2001. US$69.95 pbk (xviii + 470 pages) ISBN 0 471 38391 0

As the title

indicates, this book

is intended to

provide a practice-

oriented guide for

choosing and using

common biological

databases and

analysis tools.

It is written

primarily for

biologists, the usual

end-users of such systems, and therefore it

does not make too many assumptions

about prior knowledge of informatics.

People with a more theoretical background

in computer science or statistics will

probably be happier with a book such as

that by Mount [1], whereas software

engineers might prefer Developing

Bioinformatics Computer Skills [2].

The book contains chapters discussing

the data formats of several of the more

important databases as well as alignment,

searching, prediction, sequence assembly

and phylogenetic analysis tools. There is

clearly a focus on DNA rather than

protein-level databases and tools, even

though one could argue that the latter are

increasingly more important for most

applications. Also, the book does not

include computational biology topics such

as molecular simulations. There are two

well done chapters – one on the basics of

the Internet and the other a short tutorial

on the programming language Perl – but

they do not belong in a book on

bioinformatics.

Given its compact size of <500 pages,

the editors have tried to strike a

compromise between providing a

comprehensive overview and giving

detailed information. Each chapter begins

with some background information and

then goes on to present some databases or

tools. Generally one tool is presented for

each purpose, rather than providing a

comparison of tools with identical or

similar function and listing their

strengths and weaknesses. Also, the

selection focuses on databases and tools

that are of general interest rather than

specialized ones.

The issue of how to access or use

resources is typically handled by

providing a pointer to a web interface.

Programs with graphical user interfaces

are sometimes accompanied by

screenshots but there is little mention of

executing programs through the

command line. The focus here is on

reading and understanding the data

formats and analysis outputs rather than

discussing parameters required for

generating the data. Such information can

of course be found in the documentation

provided with the respective resources but

discussion of the aforementioned

parameters would nevertheless have

made the book a real treasure trove for

more advanced users.

Because the book covers a wide field,

chapters were contributed by more than

20 specialists. This ensures an overall

excellent quality of what is written.

Although books written in this manner

sometimes tend to overlap in certain

areas, there are surprisingly few

duplications among the chapters in the

book, a certain sign of good editing.

Regardless of its title the book should

be considered more of an introduction and

brief overview of the field of bioinformatics

rather than a comprehensive step by step

cookbook or a collection of tips by insiders

for more advanced users. I am not

suggesting that this book should have

been titled Bioinformatics for Dummies,

but given the expertise of the contributing

authors, it would have been possible to

take the book to a higher level.

Nevertheless this book is likely to find its

way into laboratories, and even specialists

will most probably read it, given the small

number of books published on

bioinformatics so far.

Eric Jain

Jain PharmaBiotech, Bläsiring 7, 4057 Basel,Switzerland.e-mail: [email protected]

References

1 Mount, D. (2001) Bioinformatics: Sequence and

Genome Analysis. Cold Spring Harbor

Laboratory Press

2 Gibas, C. and Jambeck, P. (2001) Developing

Bioinformatics Computer Skills. O’Reilly,

Sebastopol, CA, USA

Coming to grips with

microarray data analysis

Methods of Microarray Data Analysis

Edited by Simon Lin and Kimberly Johnson.Kluwer, 2001 £70 hbk (xiv + 189) ISBN 0792375645

This book is a

compilation of

presentations given

at the first annual

‘Critical Assessment

of Microarray Data

Analysis’meeting

(CAMDA 2000).

The editors of the

book organised the

conference, the

objective of which was to evaluate and

stimulate research and development (both

academic and commercial) in microarray

data analysis. At the first conference

presenters were asked to use one of two

published datasets [1,2]. The Golub data are

from leukemia patient samples hybridized

to Affymetrix gene chips and the Spellman

data are from yeast cultures hybridized to

spotted cDNA microarrays. Ten of the oral

presentations are included as (peer-

reviewed) papers in the edited volume.

Two additional review papers are included

as well as a brief glossary and index.

Additional information and links to related

software for many of the chapters as well

as the abstracts of all 30 presentations and

some slides can be found on the conference

website (http://camda.duke.edu).

The two review chapters cover machine

learning (focusing on decision trees and

artificial neural networks) and evolutionary

computation (focusing on genetic algorithms

and genetic programming). These are

informative but an overview of general

issues related to microarray data [3] is

missing. However, several chapters do

discuss normalization of array data.

Selection of differentially expressed genes

is also a common area of discussion. In

particular, Chapter 3 gives an interesting

discussion of multiple testing issues arising

from gene selection. Several chapters

cover the computational problems

associated with consideration of the joint

effects of thousands of variables but most

resort to initial univariate selection of a

subset of a few dozen. Most of the chapters

focused on discrimination (i.e., supervised

TRENDS in Biotechnology Vol.20 No.5 May 2002

http://tibtech.trends.com

226 Forum

classification) of the leukemia data.

Unfortunately this dataset is not very

challenging for classification because a few

genes can readily be identified that yield

accurate prediction. Nevertheless, many

chapters have nice discussions of important

issues in classification such as ‘boosting’

prediction accuracy by combining results

over several models and cross-validation.

Much attention is paid to machine learning

methods including Bayesian networks,

radial basis functions, neural trees and

support vector machines. There is limited

discussion of conventional statistical

approaches such as T-tests, logistic

regression and linear discriminant analysis.

There is also relatively little attention

paid to experimental design, unsupervised

classification (clustering) or visualization of

high dimensional data. Outside of references

to the machine learning literature, there are

few references to the wider statistical data

mining and pattern recognition literature

[4,5]. Given the rapid evolution of the field of

microarray data analysis, the references are

a bit out of date. But the Internet is a great

equalizer and a simple search on ‘microarray

data analysis’or ‘microarray normalization’

quickly brings many relevant discussions

and references. The book is not aimed at the

biologist wanting to learn the basic principles

of data analysis as applied to microarrays

(as is that by Knudsen, [6]), but rather at

more computational-oriented scientists

(statisticians and computer scientists alike)

who want to see how new and conventional

data mining methods can be applied to

microarray data. Given that the analyses

were not coordinated, the book does not

fully achieve its secondary objective of

providing a standardized platform for

rigorously appraising prediction methods.

A better example of this is the edited

volume from the STATLOG project [7].

References

1 Golub, T. et al. (1999) Molecular classification of

cancer: class discovery and class prediction by

gene expression monitoring. Science 286, 531–537

2 Spellman et al. (1998) Comprehensive

identification of cell cycle-regulated genes of the

yeast Saccharomyces cerevisiae by microarray

hybridization. Mol. Biol. Cell 9, 3273–3297

3 Quackenbush, J. (2001) Computational analysis

of microarray data. Nat. Rev. Genet. 2, 418–427

4 Hastie, T. et al. (2001) The Elements of Statistical

Learning: Data Mining, Inference, and Prediction.

Springer

5 Duda, R.O. et al. (2001) Pattern Classification.

(2nd Edn) Wiley-Interscience

6 Knudsen, S. (2002) A Biologist’s Guide to Analysis

of DNA Microarray Data. John Wiley & Sons

7 Michie, D. et al. (1994) Machine Learning, Neural

and Statistical Classification. Ellis Horwood

Kenneth R. Hess

Dept of Biostatistics, M.D. Anderson CancerCenter, 1515 Holcombe Blvd, Box 447,Houston, Texas 77030-4009, USA.e-mail: [email protected]

Using mass spectrometry

for drug discovery

Mass Spectrometry in Drug Discovery

edited by David T. Rossi and Michael W. Sinz.Marcel Dekker, 2002. $165.00 (hbk) (viii + 420 pages) ISBN 0 8247 0607 2

The fields of drug

discovery and

mass spectrometry

have become

increasingly

intertwined over

the past decade.

Although the

principal steps of

the drug discovery

process –

identification of

molecular targets, drug synthesis,

pharmacokinetics, toxicology and

clinical trials – remain largely

unchanged, the manner in which some

of these tasks are accomplished has

changed greatly. Currently, any

successful drug discovery program

incorporates elements from several

disparate areas of modern science, such

as combinatorial chemistry, natural

product screening, bioinformatics and

mass spectrometry. Mass spectrometry

has evolved from its restricted use in

physical and organic chemistry and is

now used routinely in biological and

pharmaceutical research programs.

Mass Spectrometry in Drug Discovery,

edited by Rossi and Sinz, is particularly

relevant to those who wish to learn more

about the strengths of applying mass

spectrometry to the science of drug

discovery. Indeed, the strongest point of

this book is that it covers the basic

principles of mass spectrometry,

including various ionization methods

and instrumentation, in an easily

understood format. The book is well

illustrated, with simple and informative

figures to explain difficult concepts. The

principles and applications of one of the

specialized techniques – liquid

chromatography-mass spectrometry

(LC-MS) – are explained in sufficient

detail. This is particularly relevant

because of the increasing use of LC-MS

as an automated high-throughput

technique for detection and quantitation

of biomolecules. Given the variety of

situations in which liquid

chromatographic separation is coupled

to mass spectrometry to provide a

readout, the editors have rightly devoted

an entire chapter to address sample

preparation and handling for LC-MS

experiments.

A major section of the book deals with

the applications of mass spectrometry,

covering principles and practices of

combinatorial chemistry, drug

metabolism, pharmacokinetics,

bioavailability and microdialysis. The

chapter on combinatorial chemistry is

excellent as it provides details on the

synthesis and characterization of various

types of combinatorial libraries.

However, an important area that does

not get adequate treatment is the use of

LC-MS to identify major compounds on

the basis of their ability to bind a target

protein. This can be an ultra high-

throughput technique, using mass-

encoded combinatorial libraries of

thousands of compounds [1], and it is

relevant because of the large number of

disease-associated target proteins known

today. As mass spectrometry is being

increasingly applied in drug discovery to

identify novel drug targets, a more

thorough discussion of this topic would

have broadened the appeal of this book.

However, this book is a good introduction

to the field of mass spectrometry,

particularly to those interested in drug

discovery. The broad coverage of the

application of mass spectrometry in the

pharmaceutical industry should also

make this book a valuable resource for

mass spectrometrists.

Shao-En Ong

Akhilesh Pandey*

Center for Experimental Bioinformatics,University of Southern Denmark,Campusvej 55, DK-5230 Odense M,Denmark.*e-mail: [email protected]

Reference:

1 Lenz, G.R. et al. (2000) Chemical ligands,

genomics and drug discovery. Drug Discov. Today

5, 145–156

Published online: 15 March 2002

TRENDS in Biotechnology Vol.20 No.5 May 2002

http://tibtech.trends.com

227Forum

a practical introduction to bioinformatics

Documents