miame, we have a problem

2
MIAME, we have a problem Robert Shields Trends in Genetics, Elsevier, 84 Theobald’s Road, London, UK, WC1X 8RR Microarrays have captured the imagination of geneticists and molecular biologists like no other technology, with the exception of perhaps PCR. Descended from the humble northern blot, which semi-quantitatively measures the expression level of one transcript at a time, through spot blots, which measures a few transcripts, microarrays claim to measure several thousand transcripts simul- taneously – spot blots writ largescale! But are they any good at doing what they claim? Two articles in this issue of Trends in Genetics [1,2] suggest that confidence in the results of the current generation of microarray experi- ments is misplaced. In the first article, Miron and Nadon [1] argue for the use of ‘inferential literacy’ in microarray analysis. In other words, it is important to have an understanding of the characteristics of high through-put data, which needs to be considered in the design and execution of an experiment and not as an optional bolt-on at the data-analysis stage. However, no amount of statistical or algorithmic knowledge can compensate for deficiencies in the technology itself. As Lord Rutherford is supposed to have said ‘if your experiment needs statistics, you ought to have done a better experiment.’ But it seems that these deficiencies are whispered and not discussed openly in polite society or in many scientific journals. The second article by Draghici and colleagues [2] is longer than the normal TIG review, but I feel the space is justified by the importance of the message. Although microarrays are adequate for detecting the direction of change in expression of genes expressed above a certain threshold, a large proportion of the transcriptome is beyond the reach of current technology [2]. This problem is glossed over by various intensity filtering steps employed by researchers doing comparative microarray studies to remove low inten- sity signals that are ‘unreliable’. Draghici et al. [2] point out that there are inconsistencies between the various microarray platforms (in situ synthesised short oligos, longer oligos, spotted oligos and spotted cDNAs), making it almost impossible, for the moment, to compare results from different platforms. Although a minimum requirement is that the same transcripts are being detected by different platforms (amazingly a significant number of probes intended to interrogate the same transcript do not), it also seems important for cross-platform consistency that the probes on different platforms correspond to the same part of the transcript. This is usually attributed to the need to assay the same splice variant, but could it be that consistency is improved because the same cross-- hybridizing sequences are then detected by all plat- forms [3]? As if the problems associated with different platforms were not enough, a recent trio of articles [4–7] showed not only inconsistencies across platforms but also inconsistencies among laboratories that were using the same platform, and even using the same RNA samples. Matters were improved by the use of common protocols for RNA work-up and also, and the importance of this is not widely appreciated, common methods of data handling and analysis. If scientists are to create gene expression databases that incorporate results from multiple laboratories, it is simply not good enough to adhere to the minimal information about microarray experiment (MIAME) guidelines, which only focus on the documentation of experimental details, while failing to address real problems with the technology and how it is used. Equally depressing is the rush to apply microarrays to obtain ‘gene signatures’ to aid disease diagnosis and prognosis. Again results from different groups studying ostensibly the same disease are frequently non-con- cordant [7,8]. The use of different microarray platforms is partly to blame for this. But perhaps most of the problem comes from lack of ‘inferential literacy’ meeting lack of epidemiological savvy. The Toxicogenomics Research Consortium suggested that more-consistent results would be achieved not with signatures from individual genes but by examining the gene ontology (GO) categories of the differentially expressed genes [6]. Perhaps, but it is a sobering comment that when two RNA samples were compared in different laboratories, on different platforms and analysed in the same way, gene-by-gene list comparisons varied. All that could be agreed on were the changes in different GO categories – representative of the tissue of origin of the samples [6]. If scientists in different laboratories cannot agree on an ordered list of gene-expression differences when pre- sented with the same two RNA samples, we really do have a problem. So what is the solution? Obviously, putting the right probes on the array would be a start – interrogat- ing the same transcript or splice form is important. Consistent standards between laboratories would help improve the consistency of results – but consistency is not enough – after all the results within a laboratory were all consistent but the results can be consistently wrong. What we need is a proper evaluation of microarrays (including sample extraction and work-up, data handling and analysis) and an understanding of what is important to achieve consistent, accurate and reproducible results across laboratories. But perhaps Corresponding author: Shields, R. ([email protected]). Available online 27 December 2005 Editorial TRENDS in Genetics Vol.22 No.2 February 2006 www.sciencedirect.com 0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2005.12.006

Upload: robert-shields

Post on 12-Sep-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MIAME, we have a problem

MIAME, we have a problem

Robert Shields

Trends in Genetics, Elsevier, 84 Theobald’s Road, London, UK, WC1X 8RR

Microarrays have captured the imagination of geneticistsand molecular biologists like no other technology, with theexception of perhaps PCR. Descended from the humblenorthern blot, which semi-quantitatively measures theexpression level of one transcript at a time, through spotblots, which measures a few transcripts, microarraysclaim to measure several thousand transcripts simul-taneously – spot blots writ largescale! But are they anygood at doing what they claim? Two articles in this issue ofTrends in Genetics [1,2] suggest that confidence in theresults of the current generation of microarray experi-ments is misplaced.

In the first article, Miron and Nadon [1] argue for theuse of ‘inferential literacy’ in microarray analysis. Inother words, it is important to have an understanding ofthe characteristics of high through-put data, whichneeds to be considered in the design and execution ofan experiment and not as an optional bolt-on at thedata-analysis stage. However, no amount of statisticalor algorithmic knowledge can compensate fordeficiencies in the technology itself. As Lord Rutherfordis supposed to have said ‘if your experiment needsstatistics, you ought to have done a better experiment.’But it seems that these deficiencies are whispered andnot discussed openly in polite society or in manyscientific journals.

The second article by Draghici and colleagues [2] islonger than the normal TIG review, but I feel thespace is justified by the importance of the message.Although microarrays are adequate for detecting thedirection of change in expression of genes expressedabove a certain threshold, a large proportion ofthe transcriptome is beyond the reach of currenttechnology [2]. This problem is glossed over by variousintensity filtering steps employed by researchers doingcomparative microarray studies to remove low inten-sity signals that are ‘unreliable’. Draghici et al. [2]point out that there are inconsistencies between thevarious microarray platforms (in situ synthesised shortoligos, longer oligos, spotted oligos and spottedcDNAs), making it almost impossible, for the moment,to compare results from different platforms. Althougha minimum requirement is that the same transcriptsare being detected by different platforms (amazingly asignificant number of probes intended to interrogatethe same transcript do not), it also seems importantfor cross-platform consistency that the probes ondifferent platforms correspond to the same part ofthe transcript. This is usually attributed to the need toassay the same splice variant, but could it be that

Corresponding author: Shields, R. ([email protected]).Available online 27 December 2005

www.sciencedirect.com 0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved

consistency is improved because the same cross--hybridizing sequences are then detected by all plat-forms [3]?

As if the problems associated with different platformswere not enough, a recent trio of articles [4–7] showednot only inconsistencies across platforms but alsoinconsistencies among laboratories that were using thesame platform, and even using the same RNA samples.Matters were improved by the use of common protocolsfor RNA work-up and also, and the importance of this isnot widely appreciated, common methods of datahandling and analysis. If scientists are to create geneexpression databases that incorporate results frommultiple laboratories, it is simply not good enough toadhere to the minimal information about microarrayexperiment (MIAME) guidelines, which only focus on thedocumentation of experimental details, while failing toaddress real problems with the technology and how itis used.

Equally depressing is the rush to apply microarraysto obtain ‘gene signatures’ to aid disease diagnosis andprognosis. Again results from different groups studyingostensibly the same disease are frequently non-con-cordant [7,8]. The use of different microarray platformsis partly to blame for this. But perhaps most of theproblem comes from lack of ‘inferential literacy’ meetinglack of epidemiological savvy. The ToxicogenomicsResearch Consortium suggested that more-consistentresults would be achieved not with signatures fromindividual genes but by examining the gene ontology(GO) categories of the differentially expressed genes [6].Perhaps, but it is a sobering comment that when twoRNA samples were compared in different laboratories,on different platforms and analysed in the same way,gene-by-gene list comparisons varied. All that couldbe agreed on were the changes in different GO categories– representative of the tissue of origin of the samples [6].If scientists in different laboratories cannot agree on anordered list of gene-expression differences when pre-sented with the same two RNA samples, we really dohave a problem.

So what is the solution? Obviously, putting theright probes on the array would be a start – interrogat-ing the same transcript or splice form is important.Consistent standards between laboratories would helpimprove the consistency of results – but consistency isnot enough – after all the results within a laboratorywere all consistent but the results can be consistentlywrong. What we need is a proper evaluation ofmicroarrays (including sample extraction and work-up,data handling and analysis) and an understanding ofwhat is important to achieve consistent, accurate andreproducible results across laboratories. But perhaps

Editorial TRENDS in Genetics Vol.22 No.2 February 2006

. doi:10.1016/j.tig.2005.12.006

Page 2: MIAME, we have a problem

Editorial TRENDS in Genetics Vol.22 No.2 February 200666

most important is that scientists understand the natureof the technology they are using – including experimen-tal design, execution and analysis. We need to gobeyond MIAME.

References

1 Miron, M. and Nadon, R. (2006) Inferential literacy for experimentalhigh-throughput biology. Trends Genet. 22, (this issue, February 2006)doi: 10.1016/j.tig.2005.12.001

2 Draghici, S. et al. (2006) Reliability and reproducibility issues in DNAmicroarray measurements. Trends Genet. 22, (this issue, February2006) doi: 10.1016/j.tig.2005.12.005

AGORA initiative provides free agricult

The Health Internetwork Access to Research Initiative (HINARI) of the

and Agriculture O

As part of this enterprise, Elsevier has given 185 journals to Access to

institutions are now registered for the scheme, which aims to provid

ultimately help increase crop yields and e

According to the Africa University in Zimbabwe, AGORA has been w

information to our fingertips’ says Vimbai Hungwe. ‘The information

and research activities within the University. Given the economic hard

For more inform

http://www.healthin

www.sciencedirect.com

3 Zhang, J. et al. (2005) Detecting false expression signals in high-densityoligonucleotide arrays by an in silico approach. Genomics 85, 297–308

4 Larkin, J.E. et al. (2005) Independence and reproducibility acrossmicroarray platforms. Nature Methods 2, 337–343

5 Irizarry, R.A. (2005) Multiple-laboratory comparison of microarrayplatforms. Nature Methods 2, 345–350

6 Members of the Toxicogenomics Research Consortium. (2005) Standar-dizing global gene expression analysis between laboratories and acrossplatforms. Nature Methods 2, 351–356

7 Ioannidis, J.P.A. (2005) Microarrays and molecular research: noisediscovery? Lancet 365, 454–455

8 Michiels, S. et al. (2005) Prediction of cancer outcome withmicroarrays: a multiple random validation strategy. Lancet 365,488–492

ure journals to developing countries

WHO has launched a new community scheme with the UN Food

rganization.

Global Online Research in Agriculture (AGORA). More than 100

e developing countries with free access to vital research that will

ncourage agricultural self-sufficiency.

elcomed by both students and staff. ‘It has brought a wealth of

made available goes a long way in helping the learning, teaching

ships we are going through, it couldn’t have come at a better time.’

ation visit:

ternetwork.net