pnas-2013-doolittle-5294-300

7
PERSPECTIVE Is junk DNA bunk? A critique of ENCODE W. Ford Doolittle 1 Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada B3H 4R2 Edited by Michael B. Eisen, Howard Hughes Medical Institute, University of California, Berkeley, CA, and accepted by the Editorial Board February 4, 2013 (received for review December 11, 2012) Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODEs ontology. Specically, what would we expect for the number of functional elements (as ENCODE denes them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise signicantly with C-value then, (i ) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii ) ENCODEs denition of functional element identies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii ) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propo- sitions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed. evolution | human genome There is much excitement in the blogosphere, among mainstream science journalists, and within the community of practicing genome biologists about a urry of articles and letters in the September 6th, 2012 issue of Nature. These papers and many others published at about the same time and since under the umbrella of the ENCODE project collectively claim function for the majority of the 3.2 Gb human genome, not just the few percent al- ready recognized as genes (traditionally de- ned) or obvious gene-controlling elements. Kolata writes in The New York Times that [t]he human genome is packed with at least four million gene switches that reside in bits of DNA that were once dismissed as junkbut that turn out to play critical roles in con- trolling how cells, organs and other tissues behave(1). In a Nature News and View commentary, Ecker et al. (2) assert that [o]ne of the more remarkable ndings described in the consortiums entrée paper is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly junk DNA.’” The editors of The Lancet (3) enthuse: Far from being junk, the DNA between protein encoding genes consists of myriad elements that determine gene ex- pression, whether by switching transcription on or off, or by regulating the degree of transcription and consequently the concen- trations and function of all proteins. Suc- cinctly, in Science, Pennisi (4) declares that the ENCODE publications write the eulogy for junk DNA. The new datacoming from high- throughput analyses of transcriptional and chromatin landscapes, transcription factor footprints, and long-range chromosomal interactionssupport many current popu- lation genetic studies linking human dis- eases to supposedly nongenic regions, and they are truly impressive in scope and depth (5). They resonate with the current enthu- siasm for assigning multiple subtle but vital regulatory roles to the still enigmatic long noncoding RNAs (lncRNAs) now known to be transcribed from much of the length of our genome (6, 7). Additionally, congru- ence at many sites between the many meth- ods used (RNA sequencing, binding by one or more of 100+ DNA binding proteins, DNase I hypersensitivity, histone modica- tion, DNA methylation, and chromosome conformation capture) leaves no doubt that many of these 4 million gene switches do represent chromosomal loci that are special in some way, in at least one cell type. How- ever, do ENCODEs data truly require us to abandon the widespread notion that junk DNAhere specically understood as DNA that does not encode information promot- ing the survival and reproduction of the organisms that bear itis the major con- stituent of many eukaryotic genomes, our own genome included? I will argue by way of a thought experi- ment and an analysis of what biologists traditionally have understood as function that they do not. At the very least, junkas it has been conceived is an apt descrip- tor of the bulk of many genomes larger than our own. Moreover, it almost certainly still is for much of our genome, unless we hold Homo sapiens to be unique among the animals in the efciency of its chromo- somal organization and not just its cultural attainments. Such genomic anthropocen- trism, unacknowledged conation of pos- sible meanings of function,questionable null hypotheses, and unrecognized pana- daptationism are behind this most recent attempt to junk junk.Several of these same points have been made in brief by Eddy (8) and Niu and Jiang (9). My aim here is to remind readers of the structure of some earlier arguments in defense of the junk concept (10) that remain compelling, despite the obvious success of ENCODE in mapping the sub- tle and complex human genomic landscape. Also, I will suggest that we need as biol- ogists to defend traditional understand- ings of function: the publicity surrounding ENCODE reveals the extent to which these understandings have been eroded. How- ever, theoretical expansion in other di- rections, reconceptualizing junk, might be advisable. Perennial Problem of C-Value Information and Structure. The junk idea long predates genomics and since its early decades has been grounded in the C-value paradox,the observation that DNA amounts (C-value denotes haploid nuclear DNA con- tent) and complexities correlate very poorly Author contributions: W.F.D. wrote the paper. The author declares no conict of interest. This article is a PNAS Direct Submission. M.B.E. is a guest editor invited by the Editorial Board. 1 E-mail: [email protected]. 52945300 | PNAS | April 2, 2013 | vol. 110 | no. 14 www.pnas.org/cgi/doi/10.1073/pnas.1221376110

Upload: milithebilly

Post on 22-Dec-2015

213 views

Category:

Documents


1 download

DESCRIPTION

s

TRANSCRIPT

Page 1: PNAS-2013-Doolittle-5294-300

PERSPECTIVE

Is junk DNA bunk? A critique of ENCODEW. Ford Doolittle1

Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada B3H 4R2

Edited by Michael B. Eisen, Howard Hughes Medical Institute, University of California, Berkeley, CA, and accepted by the Editorial Board February 4, 2013 (received for reviewDecember 11, 2012)

Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments forjunk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE’s ontology. Specifically, what would we expectfor the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to staymore or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sortof role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then,(i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE’s definition of functionalelement identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii) the samephenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propo-sitions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes ofcomplexity, and selection as a multilevel phenomenon, is needed.

evolution | human genome

There is much excitement in the blogosphere,among mainstream science journalists, andwithin the community of practicing genomebiologists about a flurry of articles and lettersin the September 6th, 2012 issue of Nature.These papers and many others published atabout the same time and since under theumbrella of the ENCODE project collectivelyclaim function for the majority of the 3.2 Gbhuman genome, not just the few percent al-ready recognized as genes (traditionally de-fined) or obvious gene-controlling elements.Kolata writes in The New York Times that“[t]he human genome is packed with at leastfour million gene switches that reside in bitsof DNA that were once dismissed as ‘junk’but that turn out to play critical roles in con-trolling how cells, organs and other tissuesbehave” (1). In a Nature News and Viewcommentary, Ecker et al. (2) assert that “[o]neof the more remarkable findings describedin the consortium’s entrée paper is that 80%of the genome contains elements linked tobiochemical functions, dispatching the widelyheld view that the human genome is mostly‘junk DNA.’” The editors of The Lancet (3)enthuse: “Far from being ‘junk,’ the DNAbetween protein encoding genes consists ofmyriad elements that determine gene ex-pression, whether by switching transcriptionon or off, or by regulating the degree oftranscription and consequently the concen-trations and function of all proteins.” Suc-cinctly, in Science, Pennisi (4) declares thatthe ENCODE publications write the “eulogyfor junk DNA.”The new data—coming from high-

throughput analyses of transcriptional and

chromatin landscapes, transcription factorfootprints, and long-range chromosomalinteractions—support many current popu-lation genetic studies linking human dis-eases to supposedly nongenic regions, andthey are truly impressive in scope and depth(5). They resonate with the current enthu-siasm for assigning multiple subtle but vitalregulatory roles to the still enigmatic longnoncoding RNAs (lncRNAs) now knownto be transcribed from much of the lengthof our genome (6, 7). Additionally, congru-ence at many sites between the many meth-ods used (RNA sequencing, binding by oneor more of 100+ DNA binding proteins,DNase I hypersensitivity, histone modifica-tion, DNA methylation, and chromosomeconformation capture) leaves no doubt thatmany of these 4 million gene switches dorepresent chromosomal loci that are specialin some way, in at least one cell type. How-ever, do ENCODE’s data truly require us toabandon the widespread notion that junkDNA—here specifically understood as DNAthat does not encode information promot-ing the survival and reproduction of theorganisms that bear it—is the major con-stituent of many eukaryotic genomes, ourown genome included?I will argue by way of a thought experi-

ment and an analysis of what biologiststraditionally have understood as functionthat they do not. At the very least, “junk”as it has been conceived is an apt descrip-tor of the bulk of many genomes largerthan our own. Moreover, it almost certainlystill is for much of our genome, unless wehold Homo sapiens to be unique among the

animals in the efficiency of its chromo-somal organization and not just its culturalattainments. Such genomic anthropocen-trism, unacknowledged conflation of pos-sible meanings of “function,” questionablenull hypotheses, and unrecognized pana-daptationism are behind this most recentattempt to junk “junk.”Several of these same points have been

made in brief by Eddy (8) and Niu andJiang (9). My aim here is to remind readersof the structure of some earlier argumentsin defense of the junk concept (10) thatremain compelling, despite the obvioussuccess of ENCODE in mapping the sub-tle and complex human genomic landscape.Also, I will suggest that we need as biol-ogists to defend traditional understand-ings of function: the publicity surroundingENCODE reveals the extent to which theseunderstandings have been eroded. How-ever, theoretical expansion in other di-rections, reconceptualizing junk, mightbe advisable.

Perennial Problem of C-ValueInformation and Structure. The junk idealong predates genomics and since its earlydecades has been grounded in the “C-valueparadox,” the observation that DNA amounts(C-value denotes haploid nuclear DNA con-tent) and complexities correlate very poorly

Author contributions: W.F.D. wrote the paper.

The author declares no conflict of interest.

This article is a PNAS Direct Submission. M.B.E. is a guest editor

invited by the Editorial Board.

1E-mail: [email protected].

5294–5300 | PNAS | April 2, 2013 | vol. 110 | no. 14 www.pnas.org/cgi/doi/10.1073/pnas.1221376110

Page 2: PNAS-2013-Doolittle-5294-300

with organismal complexity or evolutionary“advancement” (10–14). Humans do havea thousand times as much DNA as simplebacteria, but lungfish have at least 30 timesmore than humans, as do many floweringplants and some unicellular protists (14).Moreover, as is often noted, the discon-nection between C-value and organismalcomplexity is also found within more re-stricted groups comprising organisms ofseemingly similar lifestyle and comparableorganismal or behavioral complexity. Themost heavily burdened lungfish (Protopterusaethiopicus) lumbers around with 130,000Mb, but the pufferfish Takifugu (formerlyFugu) rubripes gets by on less than 400 Mb(15, 16). A less familiar but better (becausemonophyletic) animal example might beamphibians, showing a 120-fold range fromfrogs to salamanders (17). Among angio-sperms, there is a thousandfold variation(14). Additionally, even within a single ge-nus, there can be substantial differences.Salamander species belonging to Plethodonboast a fourfold range, to cite a comparativestudy popular from the 1970s (18). Some-times, such within-genus genome size dif-ferences reflect large-scale or whole-genomeduplications and sometimes rampant selfishDNA or transposable element (TE) mul-tiplication. Schnable et al. (19) figure thatthe maize genome has more than doubledin size in the last 3 million y, overwhelminglythrough the replication and accumulation ofTEs for example. If we do not think of thisadditional or “excess” DNA, so manifestthrough comparisons between and withinbiological groups, as junk (irrelevant if notfrankly detrimental to the survival and re-production of the organism bearing it),how then are we to think of it?

Of course, DNA inevitably does have abasic structural role to play, unlinked tospecific biochemical activities or the encod-ing of information relevant to genes andtheir expression. Centromeres and telomeresexemplify noncoding chromosomal compo-nents with specific functions. More gener-ally, DNA as a macromolecule bulks up andgives shape to chromosomes and thus, asmany studies show, determines importantnuclear and cellular parameters such as di-vision time and size, themselves coupledto organismal development (11–13, 17).The “selfish DNA” scenarios of 1980 (20–22), in which C-value represents only theoutcome of conflicts between upwardpressure from reproductively competingTEs and downward-directed energetic re-straints, have thus, in subsequent decades,yielded to more nuanced understandings.Cavalier-Smith (13, 20) called DNA’s

structural and cell biological roles “nucleo-skeletal,” considering C-value to be opti-mized by organism-level natural selection(13, 20). Gregory, now the principal C-valuetheorist, embraces a more “pluralistic, hier-archical approach” to what he calls “nucleo-typic” function (11, 12, 17). A balancebetween organism-level selection on nuclearstructure and cell size, cell division timesand developmental rate, selfish genome-levelselection favoring replicative expansion, and(as discussed below) supraorganismal (clade-level) selective processes—as well as drift—must all be taken into account.

These forces will play out differently indifferent taxa. González and Petrov (23) pointout, for instance, that Drosophila and humansare at opposite extremes in terms of thebalance of processes, with the minimalistgenomes of the former containing few (butmostly young and quite active) TEs, whereasat least one-half of our own much larger ge-nome comprises the moribund remains ofolder TEs, principally SINEs and LINEs(short and long interspersed nuclear ele-ments). Such difference may in part reflectpopulation size. As Lynch notes, small pop-ulation size (characteristic of our species)will have limited the effectiveness of naturalselection in preventing a deleterious accu-mulation of TEs (24, 25).

Zuckerkandl (26) once mused that all ge-nomic DNA must be to some degree “polite,”in that it must not lethally interfere with geneexpression. Indeed, some might suggest, as Iwill below, that true junk might better bedefined as DNA not currently held toaccount by selection for any sort of roleoperating at any level of the biological hier-archy (27). However, junk advocates haveto date generally considered that even DNAfulfilling bulk structural roles remains, interms of encoded information, just junk.Cell biology may require a certain C-value,but most of the stretches of noncoding DNAthat go to satisfying that requirement arejunk (or worse, selfish).

In any case, structural roles or multi-level selection theorizing are not whatENCODE commentators are endorsing whenthey proclaim the end of junk, touting theexistence of 4 million gene switches or myr-iad elements that determine gene expressionand assigning biochemical functions for80% of the genome. Indeed, there wouldbe no excitement in either the press or thescientific literature if all the ENCODE teamhad done was acknowledge an establishedtheory concerning DNA’s structural im-portance. Rather, the excitement comesfrom interpreting ENCODE’s data to meanthat a much larger fraction of our DNA

than until very recently thought contributesto our survival and reproduction as organ-isms, because it encodes information tran-scribed or expressed phenotypically in onetissue or another, or specifically regulatessuch expression.

A Thought Experiment. ENCODE (5) de-fines a functional element (FE) as “a discretegenome segment that encodes a definedproduct (for example, protein or non-codingRNA) or displays a reproducible biochemicalsignature (for example, protein binding, ora specific chromatin structure).” A simplethought experiment involving FEs so-definedis at the heart of my argument.

Suppose that there had been (and proba-bly, some day, there will be) ENCODEprojects aimed at enumerating, by transcrip-tional and chromatin mapping, factor foot-printing, and so forth, all of the FEs in thegenomes of Takifugu and a lungfish, somesmall and large genomed amphibians (in-cluding several species of Plethodon), plants,and various protists. There are, I think, twopossible general outcomes of this thoughtexperiment, neither of which would give usclear license to abandon junk.

The first outcome would be that FEs (es-timated to be in the millions in our genome)turn out to be more or less constant innumber, regardless of C-value—at leastamong similarly complex organisms. If largerC-value by itself does not imply more FEs,then there will, of course, be great differencesin what we might call functional density(FEs per kilobase) (26) among species. FEsspaced by kilobases in Arabidopsis wouldbe megabases apart in maize on average.Averages obscure details: the extra DNAin the larger genomes might be sequesteredin a few giant silent regions rather thanuniformly stretching out the space betweenFEs or lengthening intragenic introns. How-ever, in either case, this DNA could be seenas a sort of polite functionless filler or dil-uent. At best, such DNA might have func-tions only of the structural or nucleoskeletal/nucleotypic sort. Indeed, even this sort offunctional attribution is not necessary. Thereis room within an expanded, pluralistic andhierarchical theory of C-value (see below)(12, 27) for much DNA that makes nocontribution whatever to survival and re-production at the organismal level and thusis junk at that level, although it may beunder selection at the sub- or supraor-ganismal levels (TEs and clade selection).

If the human genome is junk-free, then itmust be very luckily poised at some sort ofminimal size for organisms of human com-plexity. We may no longer think that

Doolittle PNAS | April 2, 2013 | vol. 110 | no. 14 | 5295

PERS

PECT

IVE

Page 3: PNAS-2013-Doolittle-5294-300

mankind is at the center of the universe,but we still consider our species’ genome tobe unique, first among many in having madesuch full and efficient use of all of its millionsof SINES and LINES (retrotransposableelements) and introns to encode the multi-tudes of lncRNAs and house the millions ofenhancers necessary to make us the uniquelycomplex creatures that we believe ourselvesto be. However, were this extraordinary co-incidence the case, a corollary would be thatjunk would not be defunct for many otherlarger genomes: the term would not need tobe expunged from the genomicist’s lexiconmore generally. As well, if, as is commonlybelieved, much of the functional complexityof the human genome is to be explainedby evolution of our extraordinary cognitivecapacities, then many other mammals oflesser acumen but similar C-value musttruly have junk in their DNA.

The second likely general outcome of mythought experiment would be that FEs asdefined by ENCODE increase in numberwith C-value, regardless of apparent organ-ismal complexity. If they increase roughlyproportionately, FE numbers will vary overa many-hundredfold range among organ-isms normally thought to be similarly com-plex. Defining or measuring complexity is,of course, problematic if not impossible.Still, it would be hard to convince ourselvesthat lungfish are 300 times more complexthan Takifugu or 40 times more complexthan us, whatever complexity might be.More likely, if indeed FE numbers turn outto increase with C-value, we will decidethat we need to think again about whatfunction is, how it becomes embedded inmacromolecular structures, and what FEs asdefined by ENCODE have to tell us about it.

Problematics of FunctionDefinition and Inference. What do wemean by function, informational or other-wise? Most philosophers of biology, andlikely, most practicing biologists whenpressed, would endorse some form of theselected effect (SE) definition of function(28–30). Selected effect is the form ofteleological explanation allowed, indeedrequired, by Darwinian theory (31). Ac-cordingly, the functions of a trait or featureare all and only those effects of its presencefor which it was under positive natural se-lection in the (recent) past and for which itis under (at least) purifying selection now.They are why the trait or feature is theretoday and possibly why it was originallyformed. Thus, we might reasonably say thatthe function of the lac operon in Escherichiacoli is (and presumably, long has been) to

allow facultative growth of bacteria onβ-galactosides, because we believe that,long before E. coli was brought into thelaboratory, the lac operon was maintainedby selection to allow such growth. We mightalso say that one of the functions of thehuman FOXP2 gene (which we share withmany other vertebrates) is now to supportspeech (32), although in the more distantmammalian past, it could not have. Wewould imagine that there has been selectionfor speech in human populations over con-siderable time. Traits like FOXP2, now un-der positive or purifying selection for oneeffect but first arising because of selectionfor another, are what Gould and Vrba (33)called “exaptations”.

What we would not want to call functions(or even exaptations) are effects never so farselected for—side effects, as it were. Gouldand Lewontin (34) famously called these“spandrels.” They comprise both undesirablebut apparently unavoidable consequences,like vulnerability to phages in bacteria withpili or lower back pain in primates walkingupright, and seemingly neutral ones, like thethumping noise made by the heart, to use anexample beloved of philosophers. Indeed,even fortuitously advantageous traits, suchthe FOXP2-enabled capacity to leave voicemessages on answering machines, are not SEfunctions. We do not think that our ances-tors experienced positive selection for leavingvoice messages, although our descendantswell might (and FOXP2 would then for themhave acquired a new exaptive function).

In any case, past selection, recent or an-cient, can only be inferred, and we must useindirect ways to make the inference. Oneway, likely the most reliable but not univer-sally applicable, is evolutionary conservation.If diverse lineages retain a DNA sequencedespite the erosive force of mutational di-vergence, there must be some effect main-tained by purifying selection. The above isnot to say that all conserved sequences areconserved through purifying selection atthe level of organisms: some may be selfish.Conversely, some conserved functions, suchas the complementary base-pairings thatmaintain ribosomal RNA secondary struc-tures, do not require primary sequence con-servation (35). Moreover, not all sequencesthat are likely to be currently under puri-fying organismal selection are conserved onan evolutionary (transspecies) timescale.In a recent comparative genomic survey,Ward and Kellis (36) find both mammal-conserved human sequences showing in-creasing diversity within our species (andthus, likely becoming nonfunctional in hu-mans) and mammal-nonconserved sequence

showing reduced within-human diversity(and thus, likely acquiring new functionamong us). Ponting and Hardison (37),using methods that they believe to take intoaccount such turnover, “estimate that thesteady-state value of αsel [the proportion ofall nucleotides in the human genome thatare subject to purifying selection because oftheir biological function] lies between 10%and 15%” (37).

Another way to attribute function isthrough experimental ablation: whateverorganism-level effect E does not occur afterdeleting or blocking the expression of a re-gion R of DNA is taken to be the latter’sfunction. This attribution is close to theeveryday understanding of function, as inthe function of the carburetor is to oxy-genate gasoline. The approach embodieswhat philosophers would call a causal role(CR) definition of function and supposedlyeschews evolutionary or historical justi-fications. Much biological research intofunction is done this way, but I think thatmost biologists consider that experimentalablation indirectly points to SE. They be-lieve that effect E could, under suitableconditions, be shown to have contributedto the past fitness of organisms and mostimportantly, that R exists as it does be-cause of E. Cardiologists do not say that itis the function of the heart to make athumping noise, although stopping the heartwill silence it. Similarly, geneticists studyingHuntington disease would not say that thetrinucleotide repeat in the cognate gene, re-iteration of which gives rise to the disorder,has disease causation as a function—althoughreplacing the repeat with a nonidentical set ofunique triplets encoding the same aminoacid sequence would eliminate the delete-rious effect (38).

A third, and the least reliable, method toinfer function is mere existence. The presenceof a structure or the occurrence of a processor detectable interaction, especially if com-plex, is taken as adequate evidence for itsbeing under selection, even when ablation isinfeasible and the possibly selectable effect ofpresence remains unknown. Because our ge-nomes have introns, Alu elements, and en-dogenous retroviruses, these things mustbe doing us some good. Because a regionis transcribed, its transcript must have somefitness benefit, however remote. Becauseresidue N of protein P is leucine in speciesA and isoleucine in species B, there mustbe some selection-based explanation. Thisapproach enshrines “panadaptationism,”which was forcefully and effectively de-bunked by Gould and Lewontin (34) in1979 but still informs much of molecular

5296 | www.pnas.org/cgi/doi/10.1073/pnas.1221376110 Doolittle

Page 4: PNAS-2013-Doolittle-5294-300

and evolutionary genetics, includinggenomics. As Lynch (39) argues in hisessay “The Frailty of Adaptive Hypothesesfor the Origins of Adaptive Complexity,”

This narrow view of evolution has becomeuntenable in light of recent observations fromgenomic sequencing and population genetictheory. Numerous aspects of genomic architecture,gene structure, and developmental pathwaysare difficult to explain without invoking thenonadaptive forces of genetic drift and mutation.In addition, emergent biological features suchas complexity, modularity, and evolvability, allof which are current targets of considerablespeculation, may be nothing more than indirectby-products of processes operating at lowerlevels of organization.

Functional attribution under ENCODE isof this third sort (mere existence) in themain. Although FEs as defined by ENCODEmight be cross-identified by several methodsand even evolutionarily conserved, they couldmost often be the molecular equivalent ofspandrels—structured elements that are theindirect consequence of selection operatingon other features but are themselves selec-tively neutral, a form of structured noise.Demonstrations that some biochemical sig-natures are not neutral and may even meetSE criteria say nothing about the rest andare, of course, expected as long as oppor-tunism and co-optation are understood tobe key elements in evolution.

In taking such a liberal definitional course,ENCODE follows the lead of the Gene On-tology (GO) project, which defines molecularfunction in decontextualized nonhistoricalterms (40):

Molecular function describes activities, suchas catalytic or binding activities, that occur atthe molecular level. GO molecular functionterms represent activities rather than the entities(molecules or complexes) that perform the actions,and do not specify where or when, or in whatcontext, the action takes place.

Proponents of ENCODE actually areconcerned with what they call “functionalvalidation” but principally seem worriedabout the danger of mistaking functionalspecificity (recognition or activity) ratherthan the risk of attributing function (es-pecially regulatory function) where noneexists in the SE sense. Stamatoyannopoulos(41) writes:

These examples illustrate a natural temptationto equate activity with patterning of epigenomicfeatures. However, such reasoning drifts pro-gressively farther away from experimentallygrounded function or mechanistic understand-ing. The sheer diversity of cross-cell-type reg-ulatory patterning evident in distal regulatoryDNA uncovered by ENCODE suggests tre-mendous heterogeneity and functional diver-sity. ENCODE is thus in a unique position to

promote clearer terminology that separatesthe identification of functional elements perse from the ascription of specific functional ac-tivities using historical experimentally definedcategories, and also to dissuade the ascriptionof very specific functions based on a biochemicalsignature in place of a deeper mechanisticunderstanding.

Functions of Classes and Parts. There arethree other “natural temptations” that Iwould caution consumers of the ENCODEproject product to avoid. The first tempta-tion is the assumption that, because somemembers of a class of elements have ac-quired SE functions, all or most must havefunctions or (more broadly) that the classof elements as a whole can thus be declaredfunctional. Stamatoyannopoulos (41), forinstance, writes:

In marked contrast to the prevailing wisdom,ENCODE chromatin and transcription studiesnow suggest that a large number of transposableelements encode highly cell type-selective reg-ulatory DNA that controls not only their owncell-selective transcription, but also those ofneighboring genes. Far from an evolutionarydustbin, transposable elements appear to beactive and lively members of the genomic reg-ulatory community, deserving of the same levelof scrutiny applied to other genic or regulatoryfeatures.

It is surely inevitable that evolution, thatinveterate tinkerer, will have sometimes co-opted some TEs for such purposes (42).However, it is an overenthusiastic extrap-olation to describe TEs as a class as “activeand lively members of the genomic reg-ulatory community.”

Moreover, the word “regulation” has itselfdegraded through use by genomicists, fromdesignating evolved effects shown or likely toenhance fitness, presumably by efficientcontrol of the use of resources, to morebroadly denoting any measurable impactof one element or process on other elementsor processes, regardless of fitness conse-quences. I think this broadening of defi-nition misleads biologists such as Barroso(43) in a passage cited later in this essay.Pacemakers regulate heartbeats and thatis their function: tasers and caffeine alsoaffect cardiac rhythm, but we would not(at least in the former case) see this asregulatory function.

Regulation, defined in this loose way, is,for instance, the assumed function of manyor most lncRNAs, at least for some authors(6, 7, 44, 45). However, the transcriptionalmachinery will inevitably make errors: ac-curacy is expensive, and the selective costof discriminating against all false promoterswill be too great to bear. There will belncRNAs with promoters that have arisen

through drift and exist only as noise (46).Similarly, binding to proteins and otherRNAs is something that RNAs do. It isinevitable that some such interactions, ini-tially fortuitous, will come to be genuinelyregulatory, either through positive selectionor the neutral process described below asconstructive neutral evolution (CNE). How-ever, there is no evolutionary force requiringthat all or even most do. At another (so-ciology of science) level, it is inevitablethat molecular biologists will search forand discover some of those possibly quitefew instances in which function has evolvedand argue that the function of lncRNAs asa class of elements has, at last, been dis-covered. The positivist, verificationist biasof contemporary science and the politics ofits funding ensure this outcome.

However, what is the correct conceptualframework here? Why should either functionor nonfunction for a class of elements betaken as the null hypothesis, and why shouldevidence for or against function, howeverdefined, be taken as support of one or theother? Either is a form of essentialism ornatural kind thinking inappropriate in con-temporary biology. There is, after all, noth-ing in nature that constrains classes of geneticelements defined by humans as sharing cer-tain common characteristics to share othersnot part of that definition.

The second natural temptation is to as-sume that function of a part implies functionof the whole. Co-optation of the promoters ofTEs or the insertion of bona fide regulatorysequences into introns is taken to impartfunction to the TE or intron as a whole.However, the cell does not necessarily seethe rest of the TE or the intronic surroundof an embedded enhancer as relevant to itsactivity, and only the promoter or enhancersequence may be under selection. Even whenan entire genetic element seems relevant ornecessary (whole introns must be removedeven if only certain sites are active in theirremoval), there is the possibility of excessbaggage or junk-like character. Do enhancer-harboring introns really need to be so long?

Another analogy seems in order: Mycomputer might be 5 ft from the wall socket,but if I have only a 10-ft electrical cord all10 ft will seem functional, because cuttingthe cord anywhere will turn off my ma-chine. In this connection, note that muchmore than one-half of 80.4% of the humangenome that ENCODE deems functional isso considered because it is transcribed (4),most often into an intron or lncRNA, onlya tiny fraction of the length of which islikely to be involved in potentially regulatoryinteractions.

Doolittle PNAS | April 2, 2013 | vol. 110 | no. 14 | 5297

PERS

PECT

IVE

Page 5: PNAS-2013-Doolittle-5294-300

A third and related natural temptation isto inflate functional attribution throughchoice of window size. The ENCODE Pro-ject Consortium notes (5) that:

The vast majority (80.4%) of the human ge-nome participates in at least one biochemicalRNA- and/or chromatin-associated event in atleast one cell type. Much of the genome lies closeto a regulatory event: 95% of the genome lieswithin 8 kilobases (kb) of a DNA–protein in-teraction (as assayed by bound ChIP-seq motifsor DNase I footprints), and 99% is within 1.7 kbof at least one of the biochemical events mea-sured by ENCODE.

Even this last and largest percentage,assuming the average biochemical eventto directly involve tens to hundreds of bases(41), assigns functions to only a minority(perhaps 10%) of the genome’s base pairs.ENCODE’s data are indeed unexpected andimpressive, but without some principledand agreed-on metric for functional densityand FE boundaries, any answer to thequestion “how much of the genome isfunctional?” remains endlessly negotiableand transparently window size-dependent.

Future Function. In her News and Vieweditorial in the September 6, 2012 issue ofNature, Barroso (43) speculates as followsconcerning the vast majority of humanDNA, until now thought useless:

. . .there is a good reason to keep this DNA.Results from the ENCODE project show thatmost of these stretches of DNA harbor regionsthat bind proteins and RNA molecules, bringingthese into positions from which they cooperatewith each other to regulate the function andlevel of expression of protein-coding genes. Inaddition, it seems that widespread transcriptionfrom non-coding DNA potentially acts as areservoir for the creation of new functionalmolecules, such as regulatory RNAs.

In addition to equating regulation withhaving an effect, Barroso (43) revives here thenotion that excess DNA is not junk because itmay some day be of use, and thus it ismaintained as a reservoir. Brenner (47) longago derided this sort of reasoning as follows:

There is a strong and widely held belief that allorganisms are perfect and that everything withinthem is there for a function. Believers ascribe tothe Darwinian natural selection process a fastidi-ous prescience that it cannot possibly have andsome go so far as to think that patently uselessfeatures of existing organisms are there as aninvestment for the future...

One sees similar Panglossian futuristicspeculation in some of the recent literatureon robustness and evolvability (critique inref. 48). However, it cannot in general bethe case that selection operating at thelevel of fitness of individuals within a spe-cies can favor the origin or maintenance of

traits that incurs selective cost at that level,while offering only the remotest hope offuture benefit to the individual and itsdescendants.

Other evolutionary mechanisms can. Thepublications by Lynch et al. (24) and Lynch(39, 49) have effectively argued that driftoperating in very small populations will (bychance) encourage accumulation of DNAthat can add to C-value and might, in thefuture, come in handy. Selection at thesuborganismal (selfish DNA) level mayalso seem to be future-directed: TEs dosometimes later become useful through co-optation or general effects on the generationof novelty (42, 50). Indeed, Fedoroff (51) hasrecently proposed that TEs should not bedescribed pejoratively as “selfish” and thatthe prevailing view—that the epigeneticsilencing mechanisms that eukaryotesuse to limit TE replication arose to do justthat—puts the evolutionary cart before thehorse. Rather, Fedoroff (51) suggests thatthese mechanisms, by also limiting re-combination between repeated TEs (whichmakes genomes smaller), allow TEs to ac-cumulate, growing genomes and that thiswas a “. . .critical step in the evolution ofmulticellular organisms, underpinning theability to diversify duplicates for expres-sion in specific cells and tissues” (51).

Fedoroff (51) concludes that

On balance, then, the likelihood that contem-porary eukaryotic genomes evolved in the con-text of epigenetic mechanisms seems vastly greaterthan the likelihood that they were invented asan afterthought to combat a plague of parasitictransposons.

However, evolutionary explanations ofgenome structure need not be either/or inthis sense, after it is recognized that selec-tion affecting the genome operates at all(including supraorganismal) levels of thebiological hierarchy (12, 27). TEs can beselected through selfish replication at thelevel of DNA, while selection at the levelof organisms has established silencing mech-anisms to reign in TE replication, and selec-tion at the level of clades has looked favorablyon those clades with complex genomes thatengender evolutionary novelty and renderwhole-clade extinction less likely. (That is,clades that have TE-rich dynamic genomesmay have indeed because of that producedmore and more interestingly diverse andevolutionarily robust and evolvable descen-dant species.) Evolutionary forces operate si-multaneously at all levels in the same anddifferent directions with differing strengthsand results measurable in different units(such as the frequency of TEs in an indi-vidual genome, TE-enhanced individuals in

a species, TE-bearing species in a genus, orclasses comprising such species in a phylum).Indeed, one could reasonably argue for aneventual expansion of the SE definition of“function” to include all levels as long as wedistinguish them (DNA-level function, or-ganism-level function, clade-level function,and so forth). This definitional expansionmight well lead to a reduction in theamount of DNA that we could reasonablycall junk. However, I think such an ex-panded idea of function does not cur-rently inform ENCODE or most genomicthinking, and in any case, the problems ofinference, evidence, and appropriate nullhypotheses remain.

Function as a Diffusible Quality. Quiteoften, we can intuit the results of a thoughtexperiment before articulating the reasonsfor our intuition. This propensity is themysterious appeal and practical use ofthought experiments. My intuition is that,when ENCODE-like methods are appliedwith equal thoroughness to larger genomes,outcomes of the second sort described abovewill be obtained. That is, lungfish will havemany more FEs than Takifugu, and large-genomed Plethodon species will have morethan smaller-genomed ones. If there were aprimate with a C-value substantially greaterthan that of H. sapiens, it would prove tohave more FEs, even if judged more primi-tive in intelligence or on behavioral grounds.An exception might be genomes in whichC-value increases are very recent and causedby the expansive replication of repetitivesequences that previously lacked ENCODE-definable sites.

Assuming these predictions are borne out,what might we make of it? Lynch (39) sug-gests that much of the genomic- and sys-tems-level complexity of eukaryotes vis àvis prokaryotes is maladaptive, reflectingthe inability of selection to block fixationof incrementally but mildly deleteriousmutations in the smaller populations ofthe former. Thus, for instance, the fact thateukaryotic molecular machines comprisemore interacting subunits than their pro-karyotic counterparts reflects the inabilityof selection operating on smaller popula-tions to enforce functional efficiency. Ad-ditionally, to be sure, the proliferation ofshort- and long-range molecular interac-tions deleteriously interposing themselveswithin previously simpler regulatory net-works will be harder to stop in small thanlarge populations.

Several recent publications (52–55) haverevived the argument that CNE is just suchan interpositional force, possibly a very

5298 | www.pnas.org/cgi/doi/10.1073/pnas.1221376110 Doolittle

Page 6: PNAS-2013-Doolittle-5294-300

powerful one. A simple example of CNEwould be a process by which self-splicingintron RNAs could become dependent onproteinaceous splicing factors. Initially, theRNA secondary structure is sufficient tocatalyze self-removal, but fortuitously boundproteins that stabilize the RNA can com-pensate for (presuppress) mutations thatmight destabilize elements of the structurenecessary for independent splicing. Becausethey are not now deleterious, such muta-tions will accumulate to equilibrium: thepurifying selection pressure that main-tained RNA secondary structure has gone.No selection is involved at this stage. Ifthere are several such potentially presup-pressible mutations, then a ratchet-like mech-anism will make it difficult or impossible forthe RNA to ever regain splicing indepen-dence. Elimination of the protein will be-come a lethal event. Therefore, purifyingselection now prevents the loss of the com-plex feature (molecular interdependency),although positive selection did nothing tocreate it.

The entire multiprotein, multi-RNA,eukaryotic spliceosome might have evolvedthrough reiterations of this process alongwith very many of the intricacies of thecellular machinery (52–55). CNE wouldwork in concert with any population-sizeeffect. Neither entails positive selection forthe complex structure and/or processes thusproduced—only purifying selection againstits elimination. Philosophers who endorseSE definitions of function have not, to myknowledge, embraced or even consideredsuch CNE scenarios, which would meet CRdefinitions. Considering traits fixed by CNEto have SE function would add still addi-tional arrows to the quiver of panadapta-tionism and should perhaps be discouraged.

A common consequence of CNE is thateven structures or processes that have arisenby positive selection because they increaseorganismal fitness will later become morecomplex in terms of the number of inter-molecular interactions required for theirsuccessful completion. Function diffuses.Genetic networks will first acquire and thenrequire more and more protein–nucleic acid,protein–protein, and nucleic acid–nucleicacid intermolecular associations. Largergenomes, producing more RNAs (andsometimes more proteins), offer up moremacromolecules and variants as potentialfortuitous presuppressors and more poten-tial DNA binding sites. Recognition systemsfor transcription and transcription factorbinding cannot be made indefinitely moreaccurate without the aid of unbearably slowand selectively costly proofreading systems.

Tradeoffs between speed, economy, andaccuracy will unavoidably entail that largergenomes will produce disproportionatelymore noise in terms of fortuitous transcriptscapable of becoming presuppressors andthus, more complex, seemingly regulatory,networks of interaction. Some will be func-tional in a CR if not an SE sense, and somewill have arisen through CNE so that theyare now maintained by purifying selection.However, many, possibly the vast majority,are just there.

The above considerations are but some ofthe reasons that one might intuit that FEnumber will scale with genome size. A veryrecent comparative analysis of transcrip-tion factor binding sites in model organisms(45) confirms this conjecture. Ruths andNakhleh (45) claim that

. . .neutral evolutionary forces alone can explainbinding site accumulation, and that selection onthe regulatory network does not alter this finding.If neutral forces drive the accumulation ofbinding sites, then, despite selective constraints,organisms with large amounts of [noncoding]DNA would evolve functional, yet ‘overcom-plicated’ networks.

So Is Junk Bunk?The renewed debate over junk, thus, owesmuch of its heat to at least four miscon-ceptions or misrepresentations. First isthe pretense that there is any definableboundary between informational and struc-tural (genic and nongenic) functions forDNA. Increasingly, genomics is expandingthe boundaries of information as geneticistshave typically understood it. Minimally, genemeans more than it used to mean. Djebaliet al. (56) write

. . .the determination of genic regions is cur-rently defined by the cumulative lengths of theisoforms and their genetic association to phe-notypic characteristics, the likely continued re-duction in the lengths of intergenic regions willsteadily lead to the overlap of most genes pre-viously assumed to be distinct genetic loci. Thissupports and is consistent with earlier observa-tions of a highly interleaved transcribed genome,but more importantly, prompts the reconsidera-tion of the definition of a gene. As this is a con-sistent characteristic of annotated genomes, wewould propose that the transcript be consideredas the basic atomic unit of inheritance. Con-comitantly, the term gene would then denote ahigher-order concept intended to capture allthose transcripts (eventually divorced fromtheir genomic locations) that contribute to agiven phenotypic trait.

However, regulatory loci are also informa-tional even if not transcribed, and ENCODEhas documented many long-range interac-tions between chromosomal regions thatmay be brought together physically in thenucleus, a very complex and structure-rich

molecular machine, at some time duringthe cell cycle. Therefore, in this sense, thegross structure of the chromosome set alsocarries information that may be relevant tothe function of genes, broadly defined. Ad-ditionally, all DNA has the job of serving asa template for its own replication—to thatextent, encoding information.

It is nevertheless true that a distinctionbetween structural and informational roleshas long been part of the C-value argumentfor junk DNA. This line of reasoning hasheld that high C-value might be necessary forcellular function, but the nongenic DNA thatfills the requirement is informationally junk.ENCODE’s claim is that much more of theDNA is, in fact, informational (especiallyregulatory) than we had thought, and in-deed ENCODE’s focus is on sites likely tobe involved directly or indirectly in tran-scription—on the “myriad elements thatdetermine gene expression” to quote TheLancet (3). Therefore, the structure–in-formation distinction informs the interpre-tation of the project’s results, and without itthere would be nothing novel or newsworthyin the assertion that all of the human genomehas some sort of role in human biology. Wehave known that since the mid-1980s.

Second is the conflation of SE and CRdefinitions of function. Those of us whospeak of excess DNA as informationallyjunk mean that its presence is not to beexplained by past and/or current selectionat the level of organisms—that it has noinformational function construable histor-ically as an SE. Those who say that almostthe whole of the human genome is func-tional informationally do so on the basis ofan operational diagnosis embracing a non-historical CR definition of function. Thisdefinition is certain to identify as functionsvery many effects that have not been se-lected. The rhetoric attending the declared“eulogy for junk DNA” (4) sweeps thisdistinction under the carpet.

Third is a false natural kind ontology, es-sentialist in nature, that encourages (i) theattribution to a whole class of operationallydefined genetic elements those functionsknown only for a few and/or (ii) the attri-bution to the whole length of such a geneticelement a function that resides in only partof it. In the case of lncRNAs and introntranscripts, whose lengths together makeup more than three-quarters of 80% of thegenome said to be functional, this secondsort of functional attribution seems especiallymisleading.

Fourth may be a seldom-articulated or -questioned notion that cellular complexityis adaptive, the product of positive selection

Doolittle PNAS | April 2, 2013 | vol. 110 | no. 14 | 5299

PERS

PECT

IVE

Page 7: PNAS-2013-Doolittle-5294-300

at the organismal level. Our disappointmentthat humans do not have many more genesthan fruit flies or nematodes has been as-suaged by evidence that regulatory mecha-nisms that mediate those genes’ phenotypicexpressions are more various, subtle, andsophisticated (57), evidence of the sort thatENCODE seems to vastly augment. Yet thereare nonselective mechanisms, such as CNE,that could result in the scaling of FEs asENCODE defines them to C-value nonadap-tively or might be seen as selective at somelevel higher or lower than the level ofindividual organisms. Splits within the dis-cipline between panadaptationists/neutralistsand those researchers accepting or doubt-ing the importance of multilevel selectionfuel this controversy and others in biology.

I submit that, up until now, junk has beenused to denote DNA whose presencecannot reasonably be explained by naturalselection at the level of the organism forencoded informational roles. There remaingood reasons to believe that much of theDNA of many species fits this definition.Nevertheless, while still insisting on SEfunctionality, we might want to come upwith new definitions of function and junkby (i) abandoning the distinction betweeninformational and nucleoskeletal or nucle-otypic roles for DNA, (ii) admitting thatthere may be strong selection for C-valueas a determinant of many cell biologicalfeatures, (iii) fully embracing hierarchicalselection theory and acknowledging thatdifferent genomic features may have le-gitimate functions defined and in play atdifferent levels, and (iv) expanding the SEdefinition of function to include traits thatarise neutrally but are preserved by puri-fying selection (12). Much that we now calljunk could then become functional. How-ever, such a philosophically informed the-oretical expansion is not what ENCODE, orat least those authors stressing the demiseof junk, so far seem to have in mind (1–5).

In the end, of course, there is no experi-mentally ascertainable truth of these defini-tional matters other than the truth that manyof the most heated arguments in biology arenot about facts at all but rather about thewords that we use to describe what we thinkthe facts might be. However, that the debateis in the end about the meaning of wordsdoes not mean that there are not crucial dif-ferences in our understanding of the evolu-tionary process hidden beneath the rhetoric.

Note Added in Proof. I note that a veryforcefully worded critique by Graur et al.(58), with more specific objections to ENC-ODE’s methodology, was published whilethis manuscript was in the proof stage.

ACKNOWLEDGMENTS. I thank Evelyn Fox Keller, EricBapteste, James McInerney, Andrew Roger, and theCentre for Comparative Genomics and EvolutionaryBioinformatics (CGEB) group for comments on this man-uscript and the Canadian Institutes for Health Researchfor support.

1 Kolata G (September 5, 2002) Bits of mystery DNA, far from‘junk’, play crucial role. The New York Times, Section A, p. 1.2 Ecker JR, et al. (2012) Genomics: ENCODE explained. Nature489(7414):52–55.3 Anonymous (2012) Cracking ENCODE. Lancet 380(9846):950.4 Pennisi E (2012) Genomics. ENCODE project writes eulogy for junkDNA. Science 337(6099):1159–1161.5 Dunham I, et al. (2012) An integrated encyclopedia of DNAelements in the human genome. Nature 489(7414):57–74.6 Rinn JL, Chang HY (2012) Genome regulation by long noncodingRNAs. Annu Rev Biochem 81:145–166.7 Barry G, Mattick JS (2012) The role of regulatory RNA in cognitiveevolution. Trends Cogn Sci 16(10):497–503.8 Eddy SR (2012) The C-value paradox, junk DNA and ENCODE. CurrBiol 22(21):R898–R899.9 Niu DK, Jiang L (2013) Can ENCODE tell us how much junk wecarry in our genome? Biochem Biophys Res Commun 430(4):1340–1343.10 Ohno S (1972) So much “junk” DNA in our genome. BrookhavenSymp Biol 23:366–370.11 Gregory TR (2001) Coincidence, coevolution, or causation? DNAcontent, cell size, and the C-value enigma. Biol Rev Camb Philos Soc76(1):65–101.12 Gregory TR (2004) Macroevolution, hierarchy theory and theC-value enigma. Paleobiology 30:179–202.13 Cavalier-Smith T (2005) Economy, speed and size matter:Evolutionary forces driving nuclear genome miniaturization andexpansion. Ann Bot (Lond) 95(1):147–175.14 Gregory TR (2005) Synergy between sequence and size in large-scale genomics. Nat Rev Genet 6(9):699–708.15 Metcalfe CJ, Filée J, Germon I, Joss J, Casane D (2012) Evolutionof the Australian lungfish (Neoceratodus forsteri ) genome: A majorrole for CR1 and L2 LINE elements. Mol Biol Evol 29(11):3529–3539.16 Brenner S, et al. (1993) Characterization of the pufferfish(Fugu) genome as a compact model vertebrate genome. Nature 366(6452):265–268.17 Gregory TR (2003) Variation across amphibian species in the sizeof the nuclear genome supports a pluralistic, hierarchical approach tothe C-value enigma. Biol J Linn Soc Lond 79:329–339.18 Mizuno S, Macgregor HC (1974) Chromosomes, DNAsequences, and evolution in salamaders of the genus Plethodon.Chromosoma 48:239096.19 Schnable PS, et al. (2009) The B73 maize genome: Complexity,diversity, and dynamics. Science 326(5956):1112–1115.20 Cavalier-Smith T (1978) Nuclear volume control by nucleoskeletalDAN, selection for cell volme and cell growth rate, and the solutionof the DNA C-value paradox. J Cell Sci 43:247–278.21 Doolittle WF, Sapienza C (1980) Selfish genes, the phenotypeparadigm and genome evolution. Nature 284(5757):601–603.22 Orgel LE, Crick FHC (1980) Selfish DNA: The ultimate parasite.Nature 284(5757):604–607.23 González J, Petrov DA (2012) Evolution of genome content:Population dynamics of transposable elements in flies and humans.Methods Mol Biol 855:361–383.24 Lynch M, Bobay LM, Catania F, Gout JF, Rho M (2011) Therepatterning of eukaryotic genomes by random genetic drift. AnnuRev Genomics Hum Genet 12:347–366.25 Fernández A, Lynch M (2011) Non-adaptive origins ofinteractome complexity. Nature 474(7352):502–505.26 Zuckerkandl E (1986) Polite DNA: Functional density andfunctional compatibility in genomes. J Mol Evol 24(1–2):12–27.27 Doolittle WF (1987) What introns have to tell us: Hierarchy ingenome evolution. Cold Spring Harb Symp Quant Biol52:907–913.28 Garson J (2011) Selected effects and causal role functions in thebrain: The case for and etiological approach to neuroscience. BiolPhilos 26:547–565.

29 Godfrey-Smith P (1994) A modern history theory of functions.

Nous 28:344–362.30 Schwartz PH (1999) Proper function and recent selection. Philos

Sci 66:S210–S222.31 Ayala FJ (1970) Teleological explanations in evolutionary biology.

Philos Sci 37:1–15.32 Scharff C, Petri J (2011) Evo-devo, deep homology and FoxP2:

Implications for the evolution of speech and language. Philos Trans

R Soc Lond B Biol Sci 366(1574):2124–2140.33 Gould SJ, Vrba E (1982) Exaptation—a missing term in the

science of form. Paleobiology 8:4–15.34 Gould SJ, Lewontin RC (1979) The spandrels of San Marco and

the Panglossian paradigm: A critique of the adaptationist

programme. Proc R Soc Lond B Biol Sci 205(1161):581–598.35 Noller HF, Woese CR (1981) Secondary structure of 16S

ribosomal RNA. Science 212(4493):403–411.36 Ward LD, Kellis M (2012) Evidence of abundant purifying

selection in humans for recently acquired regulatory functions.

Science 337(6102):1675–1678.37 Ponting CP, Hardison RC (2011) What fraction of the human

genome is functional? Genome Res 21(11):1769–1776.38 La Spada AR, Taylor JP (2010) Repeat expansion disease: Progress

and puzzles in disease pathogenesis. Nat Rev Genet 11(4):247–258.39 Lynch M (2007) The frailty of adaptive hypotheses for the origins

of organismal complexity. Proc Natl Acad Sci USA 104(Suppl 1):

8597–8604.40 An introduction to the gene ontology. Available at http://www.

geneontology.org/GO.doc.shtml#molecular_function. Accessed

December 11, 2012.41 Stamatoyannopoulos JA (2012) What does our genome encode?

Genome Res 22(9):1602–1611.42 Kines KJ, Belancio VP (2012) Expressing genes do not forget their

LINEs: Transposable elements and gene expression. Front Biosci

17:1329–1344.43 Barroso I (2012) Non-coding but functional. Nature 489:54.44 Carninci P (2010) RNA dust: Where are the genes? DNA Res

17(2):51–59.45 van Leeuwen S, Mikkers H (2010) Long non-coding RNAs:

Guardians of development. Differentiation 80(4–5):175–183.46 Ruths T, Nakhleh L (2012) ncDNA and drift drive binding site

accumulation. BMC Evol Biol 12:159.47 Brenner S (1998) Refuge of spandrels. Curr Biol 8(19):R669.48 Pigliucci M (2008) Is evolvability evolvable? Nat Rev Genet 9(1):

75–82.49 Lynch M (2007) The Origins of Genome Architecture (Sinauer,

Sunderland, MA).50 Faulkner GJ, Carninci P (2009) Altruistic functions for selfish DNA.

Cell Cycle 8(18):2895–2900.51 Fedoroff NV (2012) Presidential address. Transposable elements,

epigenetics, and genome evolution. Science 338(6108):758–767.52 Gray MW, Lukes J, Archibald JM, Keeling PJ, Doolittle WF (2010)

Cell biology. Irremediable complexity? Science 330(6006):920–921.53 Luke�s J, Archibald JM, Keeling PJ, Doolittle WF, Gray MW (2011)

How a neutral evolutionary ratchet can build cellular complexity.

IUBMB Life 63(7):528–537.54 Stoltzfus A (2012) Constructive neutral evolution: Exploring

evolutionary theory’s curious disconnect. Biol Direct 7:35.55 Doolittle WF (2012) Evolutionary biology: A ratchet for protein

complexity. Nature 481(7381):270–271.56 Djebali S, et al. (2012) Landscape of transcription in human cells.

Nature 489(7414):101–108.57 Frazer KA (2012) Decoding the human genome. Genome Res

22(9):1599–1601.58 Graur D, et al. (2013) On the immortality of television sets:

“Function” in the human genome according to the evolution-

free gospel of ENCODE. Genome Biol Evol, 10.1093/gbe/evt028.

5300 | www.pnas.org/cgi/doi/10.1073/pnas.1221376110 Doolittle