wesfiles.wesleyan.edu€¦  · web viewwu et al. make a case that there are two primary clusters...

28
Evolutionary and Ecological Bioinformatics Biology/Computer Science 327, Fall 2013 Professors Fred Cohan and Danny Krizanc DATE LECTURER LECTURE TITLE TEXTBOOK READINGS Sept. 3 Cohan 1. Bioinformatic approaches to ecology and evolution Ch. 1 Sept. 5 Krizanc 2. Algorithms in everyday life and in research Ch. 2 Sept. 10 Cohan 3. Approaches to phylogeny through overall similarity of organisms (phenetics vs. cladistics) Sept. 12 Krizanc 4. Alignment of DNA and protein sequences Ch. 3, 4, 12 Sept. 17 Krizanc 5. Distance-based algorithms for estimating relationships (UPGMA and NJ) Ch. 6 Sept. 19 Krizanc 6. Maximum parsimony approach to phylogeny; search algorithms for finding the best phylogeny Ch. 5, 8 Sept. 24 Krizanc 7. Models of molecular evolution (including Jukes-Cantor, neutral theory, transition-transversion); incorporating molecular models in maximum likelihood algorithms for phylogeny estimation Ch. 9 Sept. 26 Krizanc 8. Testing the robustness of a tree pp. 82- 89, 134- 136, Oct. 1 Krizanc 9. Bayesian approaches to phylogeny and your own life Ch. 10 Oct. 3 Krizanc 10. Gene trees vs. species trees; splits trees and phylogenetic networks Ch. 15 Oct. 8 Krizanc 11. Genome-based trees (based on gene content and on gene order); supertrees Oct. Cohan 12. The importance of using phylogeny

Upload: others

Post on 20-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Evolutionary and Ecological BioinformaticsBiology/Computer Science 327, Fall 2013Professors Fred Cohan and Danny Krizanc

DATE LECTURER LECTURE TITLETEXTBOOK READINGS

Sept. 3 Cohan 1. Bioinformatic approaches to ecology and evolution Ch. 1Sept. 5 Krizanc 2. Algorithms in everyday life and in research Ch. 2

Sept. 10 Cohan3. Approaches to phylogeny through overall similarity of organisms (phenetics vs. cladistics)

Sept. 12 Krizanc 4. Alignment of DNA and protein sequences Ch. 3, 4, 12

Sept. 17 Krizanc5. Distance-based algorithms for estimating relationships (UPGMA and NJ) Ch. 6

Sept. 19 Krizanc6. Maximum parsimony approach to phylogeny; search algorithms for finding the best phylogeny Ch. 5, 8

Sept. 24 Krizanc

7. Models of molecular evolution (including Jukes-Cantor, neutral theory, transition-transversion); incorporating molecular models in maximum likelihood algorithms for phylogeny estimation Ch. 9

Sept. 26 Krizanc 8. Testing the robustness of a treepp. 82-89, 134-136,

Oct. 1 Krizanc 9. Bayesian approaches to phylogeny and your own life Ch. 10

Oct. 3 Krizanc10. Gene trees vs. species trees; splits trees and phylogenetic networks Ch. 15

Oct. 8 Krizanc11. Genome-based trees (based on gene content and on gene order); supertrees

Oct. 10 Cohan

12. The importance of using phylogeny for testing hypotheses about natural selection; phylogenetic algorithms for testing natural selection

Oct. 15 Cohan13. Genome-wide analysis of adaptation through gene acquisition vs. losses of genes

Oct. 17 Cohan14. Analyses of adaptation through changes in genome-wide gene expression

Oct. 22 Fall break

Oct. 24Cohan and Krizanc 15. Research projects

Oct. 29Cohan and Krizanc

16. Genome-wide approaches for finding shared genes under recent positive selection (Theory)

Oct. 31Cohan and Krizanc

17. Genome-wide approaches for finding shared genes under recent positive selection (Applications) Ch. 14

Nov. 5 Krizanc18. Assembly algorithms for genome sequencing—from isolates, metagenomes, and uncultivated single cells

Nov. 7 Cohan

19. Metagenomics in ecosystems biology: how to find out the physiological processes occurring in an ecosystem even when we don’t know who the organisms are

Page 2: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Nov. 12 Cohan20. Metagenomic approaches for characterizing community-wide organismal diversity

Nov. 14 Cohan21. Metagenomic approaches to finding out what unidentified genes do (ecological annotation)

Nov. 19 Cohan

22. The human microbiome: types of communities across humans, functional screening for novel genes, antibiotic holocausts and health consequences

Nov. 21 Cohan 23. Baseball, biology, and big data

Nov. 26Cohan and Krizanc 24. (cancelled)

Nov. 28 Thanksgiving

Dec. 3Cohan and Krizanc

25. Molecular approaches for identifying microbial diversity in natural communities—AdaptML and Ecotype Simulation

Dec. 5

Guest lecturer: Sarah Kopac

26. Microbial diversification through adaptations to physical conditions versus organic resources

ThursdayDec. 122:00-5:00Zelnick Pavillion POSTER SESSION

HOMEWORK ASSIGNMENTSDue Sept. 26 1. A pencil and paper phylogenetic problemDue Oct. 15 2. Make a tree (with help from computer algorithms)Due Oct. 31 3. Another pencil and paper phylogenetic problemDue Nov. 26 4. Comparing genomes to characterize past natural selectionDue Nov. 26 5. Project abstract

TERM PROJECTDue on Thursday, Dec. 12, 2:00 PM

Poster on research project

Due on Thursday, Dec. 12, 2:00 PM

Paper on (the same) research project

GRADINGHomework assignments 50%Term project poster 20%Term project paper 30%

Page 3: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

READINGSTextbook: Phylogenetic Trees Made Easy: A How-To Manual, Fourth Edition, Barry G. Hall., 2011, Sinauer Associates.Supplementary Readings will be listed on the class WesFiles web site.

CONTACT INFORMATION (Email is the best way to set up an appointment.)Fred Cohan207 [email protected] hours: Fridays 1:15-2:15, and by appointment

Danny Krizanc631 Exley Science [email protected]

December 2, 2013

Page 4: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Evolutionary and Ecological BioinformaticsBiology/Computer Science 327, Fall 2013Supplementary Reading

Sep. 3

1.Bioinformatic approaches to ecology and evolution

Ginsberg gives a really nice example of the Big Data approach, in this case to predict influenza levels before the CDC can, based on Google search queries (Ginsberg et al., 2009). Larson et al. provides phylogenetic evidence that wild pigs were domesticated in six different places around Eurasia (Larson et al., 2005); similarly, Thalmann et al. show that dogs were domesticated in Europe (Thalmann et al., 2013). Keeling and Palmer chart phylogenetically the most significant horizontal transfer events in eukaryotic history (Keeling and Palmer, 2008). Mikkelsen et al. have identified those genes in the genome that have been under selection for new adaptations in humans (Mikkelsen et al., 2005). Merhej compared bacterial genomes to test whether different lineages evolving independently toward pathogenicity (or mutualism) tend to lose the same genes convergently (Merhej et al., 2009). (They do!) Christina Richards et al. explored the circumstances under which gene expression changed over the course of an organism’s life, in the case of the plant Arabidopsis (Richards et al., 2012). Fierer et al. explored how the bacterial community on hands varies between the left and right hands and between people, and the effects of washing on hands’ bacterial communities (Fierer et al., 2008). Knight et al. showed, in a meta-analysis across various high-impact studies from the Earth Microbiome Project, how the similarity of environment drives the similarity of bacterial communities (Knight et al., 2012).

Sep. 5

2. Algorithms in everyday life and in research

Harel’s Chapter 4 is a "gentle" introduction to the notion of NP-completeness or why some problems are hard for computers to solve (Harel, 2000).

Sep. 10

3. Approaches to phylogeny through overall similarity of organisms

Nosenko et al. give a recent phylogeny of animals based on various genes; they explain how to choose the best set of genes when genes differ in the phylogenies they yield (Nosenko et al., 2013). Funch and Kristensen present their discovery of an animal phylum (Funch and Kristensen, 1995). Schloss and Handelsman present a phylogeny of the bacterial phyla, showing that most of the phyla do not have even a single cultivated species (Schloss and Handelsman, 2004). My recent encyclopedia chapter on species gives an overview of the concepts of species, including the dynamic qualities species have long been expected to have (Cohan, 2013). Mallet gives a species concept based on Darwin’s idea that two species should have no or very little overlap in a set of distinguishing characteristics; his concept does not deal with the dynamic qualities of cohesion irreversible separateness, and so on (Mallet, 1995). Genoways and Choate, from the heyday of numerical taxonomy, illustrate two ways of presenting data on clustering of organisms by their overall phenotypic similarity (Genoways and Choate, 1972). Kämpfer et al. make a case that species of Streptomyces form distinct, justifiable units when we demarcate species at the 80% similarity level for phenotypic traits (Kämpfer et al., 1991). Futuyma, in his textbook, explains the limitations of the phenetic approach to phylogeny (where all characters are used), and why we should constrain our analyses to those characters that are derived (Futuyma, 1998).

Page 5: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Sep. 12

4. Sequence Alignment

Sean Eddy contains a biologist view of something called dynamic programming which is the central idea behind a number of bioinformatics algorithms including how to perform pairwise sequence alignment (Eddy, 2004). I’ve also included the original papers introducing ClustalW (the most commonly used multiple alignment tool), MUSCLE (a newer tool recommended by Hall) (Edgar, 2004) and GUIDANCE (a tool for evaluating the quality of alignments described in Chapter 12 of Hall) (Penn et al., 2010).Morrison tries to answer the question ``Why would phylogeneticists ignore computerized sequence alignment’’ and makes some interesting points along the way (Morrison, 2009). His conclusion is that the current tools aren’t good enough.

Sep. 17

5. Distance-based Methods for Phylogeny Construction

I’ve included the original papers describing UPGMA (by Michener and Sokal) (Michener and Sokal, 1057) and Neighbor-Joining (by Saitou and Nei) (Saitou and Nei, 1987). Both are pretty heavy going but interesting. For gentler descriptions of these algorithms I suggest Wikipedia. For a computer science perspective on this and the next three lectures I have also included Mona Singh’s notes (from a course she teaches at Princeton) on phylogeny reconstruction.

Sep. 19

6. Maximum parsimony approach

Sep.24

7. Models of molecular evolution and maximum likelihood approach

The paper by Bos and Posada is a nice review of different models of DNA evolution and how they are used in building trees (Bos and Posada, 2005). The article by Guindon et al. discusses some recent developments in maximum likelihood algorithms that have had a real impact on how fast they are and how large a tree they can construct (Guindon et al., 2010). Sumner et al. discuss why it might not be such a good idea to use the most general model available when estimation trees (Sumner et al., 2012).

Sep. 26

8. Testing the robustness of trees

The paper by Anisimova and Gascuel introduces an approximate likelihood ratio test that can be used in conjunction with maximum likelihood methods to estimate one’s confidence in the clades of a given tree (Anisimova and Gascuel, 2006). This turns out to be much faster than using non-parametric approaches such as bootstrapping.

Oct.1

9. Bayesian methods

McGrayne discusses implicit, embedded use of Bayesian methods in baseball batting averages and other issues of daily import (McGrayne, 2011) (p. 130). Silver introduces Bayesian analysis using the mysterious panties (or nighty) story (Silver, 2012) (p. 245). Huelsenbeck et al. reviews the use of Bayesian methods in phylogeny reconstruction (Huelsenbeck et al., 2001). Ronquist and Huelsenbeck introduce the third iteration of the program Mr. Bayes (Ronquist and Huelsenbeck, 2003).

Oct.3

10. Gene trees vs species trees

The paper by Degnan and Rosenberg shows how lineage sorting can cause serious problems when trying to infer the correct species tree from gene trees (Degnan and Rosenberg, 2006). White et al. study the discordance between gene trees for three subspecies of mouse (White et al., 2009). The Iwabe et al. paper uses gene duplication/loss parsimony to root the tree of life (Iwabe et al., 1989). Zmasek and Eddy describe a straightforward algorithm for inferring duplication/loss events given a gene tree and its corresponding species tree

Page 6: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

(Zmasek and Eddy, 2001).Oct. 8

11. Genome-based trees (based on gene content and on gene order); supertrees

(no reading assigned)

Oct. 10

12. The importance of using phylogeny for testing hypotheses about natural selection; phylogenetic algorithms for testing natural selection

Donoghue presents the classic case for why every evolutionary biologist needs to pay attention to phylogeny (Donoghue, 1989). In their book on comparative biology, Harvey and Pagel explain how phylogeny can be used to make tests of natural selection (Harvey and Pagel, 1991). Probert et al. analyze the relationship between seed longevity and various phenotypic and environmental factors. In one analysis, they perform the tests using a pre-Donoghue, non-phylogenetic approach, and in another, they make a test based on phylogenetically independent contrasts (Probert et al., 2009).

Oct. 15

13. Genome-wide analysis of adaptation through gene acquisition vs. losses of genes (part 1)

Zhong et al. gives a protocol for discovering young duplicated genes in a genomic comparison of 12 Drosophila species; they show hot spots for young duplication (Zhong et al., 2013). Brenner et al. present the first complete genome sequence of a bacterium, and report the high frequencies of genes occurring in families (Brenner et al., 1995). Merhej report the convergence of gene loss in multiple lineages that have evolved pathogenicity (Merhej et al., 2009). Luo et al. (across several papers) show that various clades of E. coli are adapted to freshwater living, and they have genes, not shared with gut E. coli, that adapt them to freshwater (Luo et al., 2011); also, Sarah Kopac and I have written a commentary on this article (Cohan and Kopac, 2011). Hao and Golding present evidence that genes entering a lineage by horizontal genetic transfer are likely to evolve quickly in the context of their new homes (Hao and Golding, 2006). Bhaya et al. present genomic data that led to discovery of a fuller understanding of environmental differences between upstream and downstream habitats in a hot spring (that mineral nutrient content was greater upstream) (Bhaya et al., 2007). I have discussed the various constraints of transfer of adaptations, including architectural incompatibilities across distant relatives (Cohan, 2010). Popa et al. analyzed a polarized network of HGT events, and found the role of sequence similarity in determining frequencies of HGT; they also found how HGT is related to function (Popa et al., 2011). Choi et al. analyzed very close relatives of Streptococcus to discover both the rates of replacement and additive horizontal transfer events, and discovered asymmetries in recombination direction (Choi et al., 2012).

Oct. 17

14. Genome-wide analysis of adaptation through gene acquisition vs.

Touchon et al. identify HGT events among members of the species taxon E. coli, and show that among closest relatives, nearly all of HGT events involve genes without a function for the bacteria (Touchon et al., 2009). In an experimental evolution system, Blount et al. discovered two ecotypes that were able to coexist based on resource partitioning, one ecotype specialized

Page 7: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

losses of genes (part 2); Analyses of adaptation through changes in genome-wide gene expression

on glucose and one on citrate (Blount et al., 2008). Various papers from our lab discuss the ecotype models (Cohan, 2011a; Cohan, 2013; Cohan and Perry, 2007; Connor et al., 2010; Koeppel et al., 2013).Herring et al. use a genome “re-sequencing” approach to infer that single changes in one gene might have manifold effects on gene expression across the genome (Herring et al., 2006). Ferea et al. present a classic piece of work showing the hundreds of gene expression changes that yeast undergoes as it spontaneously evolves to be aerobic (in the absence of competitors) (Ferea et al., 1999). Sumby et al. use genome-wide gene expression and genome resequencing to show that passaging a non-pathogenic strain of Strep through a mouse brings about evolution of virulence through a single change in a signal transducing gene brings about massive changes in gene expression, including dozens of virulence genes (Sumby et al., 2006). Hahne et al. explore genome-wide in one strain of Bacillus subtilis the various gene expression changes that respond to a salinity challenge (Hahne et al., 2010). Arendt and Reznick discuss the diversity of evolutionary responses among closely related populations to a single selection pressure (Arendt and Reznick, 2008); Dettman et al. further discuss this issue in the context of bacteria through the magic of genome-wide gene expression analyses (Dettman et al., 2012). My collaborators and I have investigated the tendency for different populations of one species to find unique responses to the same selection challenge (Cohan, 1984; Cohan and Hoffmann, 1989). Fong et al. have shown through genome-wide gene expression how different genetic responses can be to the same environmental challenge (Fong et al., 2005).

Oct. 24

15. Suggestions for term research projects

See our discussion of suggested research topics.

Oct. 29

16. Genome-wide approaches for finding shared genes under recent positive selection (Theory)

Nei and Gojobori describe a simple parsimony based method for estimating dN/dS that is implemented in MEGA (Nei and Gojobori, 1986). PAML implements a maximum likelihood approach (Yang, 2007). Hughes has been perhaps the most vocal critic of using dN/dS to infer adaptive evolution (Hughes, 2007). A response to Hughes is given by Zhai et al. (Zhai et al., 2012). MUMmer describes the basic idea behind one of the most used algorithms for aligning whole genomes (Delcher et al., 1999).

Oct. 31

17. Genome-wide approaches for finding shared genes under recent positive selection (Applications)

Williamson et al. present a genome-wide analysis of selective sweeps in the human genome, across the entire species and within ethnic groups (Williamson et al., 2007). Pavlidis et al. last month presented a new algorithm (SweeD) for detecting selective sweeps from an input of thousands of whole-genome sequences (Pavlidis et al., 2013). Here they applied it to detect several genes that underwent a selective sweep on human chromosome 1. My students and I have argued why selective sweeps are not limited to a particular region of the genome in bacteria (Cohan, 2005; Kopac and Cohan, 2012). Clark et al. performed a genome-wide analysis of positive selection in the human lineage, compared to chimps, and with mouse as the outgroup (Clark et al., 2003). Note how they identified the individual genes under

Page 8: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

selection in the human lineage, and how they identified functional classes of genes with a particularly high frequency of accelerated evolution in humans. Vos developed a species concept for bacteria based on each ecotype having its own unique history of positive selection (Vos, 2011); you might think about how this idea may yield the same or different demarcations of ecotypes. Vos et al. present their new computer package ODoSE to find bacterial ecotypes as units that are different in their histories of positive selection (Vos et al., 2013).

Nov.5

18. Assembly algorithms for genome sequencing

Flicek and Birney provide a fairly recent review of the most commonly used methods of assembly (Flicek and Birney, 2009). The three papers Waterston et al., Myers et al., and She et al. give some insight into the battle between Hierarchical and Whole Genome sequencing as it played out in the early part of this century (Waterston et al., 2002; Myers et al., 2002; She et al., 2004). Finally, the paper by Chin et al. discuss some new algorithmic ideas that have come about due to the most recent advances in sequencing technology (Chin et al., 2013).

Nov. 7

19. Metagenomics in ecosystems biology: how to find out the physiological processes occurring in an ecosystem even when we don’t know who the organisms are

Bell et al. present evidence that increasing bacterial diversity increases the productivity of an ecosystem (Bell et al., 2005). Lay et al. investigate the functional diversity in an extremely cold and salty spring at the top of the world; they find that certain functions are found redundantly in a great diversity of organisms, while others are not (Lay et al., 2013). Simon et al. use a metagenomic approach to studying the microbial organismic diversity on a glacier; they also discover the genes responsible for protection against the cold in this community (Simon et al., 2009). McHardy et al. present a package called Phylopythia, for identifying organisms from a single metagenomic sequence, based on nucleotide composition (McHardy et al., 2007). Cecchini et al. use a metagenomic approach to figure out which organisms provide certain functions in the environment, in this case the ability to utilize prebiotic compounds (Cecchini et al., 2013). McMahon et al. present a functional screen for novel genes that provide a certain function, and they show that the host in which metagenomic segments are cloned makes a big difference in their expression (and ability to be screened) (McMahon et al., 2012). Sommer et al. perform a functional screen for antibiotic resistance genes in human guts; surprisingly, there are many resistance genes that show only a distant relationship to those resistance genes isolated from cultured bacteria (Sommer et al., 2009). Robertson et al. perform a functional screen for novel nitrilases, and are able to chart the history of evolutionary transitions from activity on one enantiomer to activity on another (Robertson et al., 2004). Rinke et al. show how single-cell genomics (i.e., sequencing the entire genome of one cell we cannot culture) an add to our understanding of the functional repertoire of an ecosystem (Rinke et al., 2013). (More from Rinke in the next lecture on the diversity of organisms in bacterial communities.)

Nov. 12

20. Metagenomic approaches for characterizing community-

DeSantis et al. present their algorithm and web site, GreenGenes, for classifying a 16S rRNA sequence to a taxon (DeSantis et al., 2006). Konstantinidis and Tiedje present evidence for criteria (or a range of criteria) of 16S rRNA divergence for demarcating taxa of different ranks (Konstantinidis and Tiedje, 2005). Kim et al. is my foray into discovery of new genera and

Page 9: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

wide organismal diversity

species by 16S rRNA analysis of environmental DNA (Kim et al., 2012). Sogin et al. present the first high-throughput sequencing of environmental DNA from a marine habitat, providing evidence that there is an extraordinary diversity of extremely rare organisms (Sogin et al., 2006). We briefly revisit Simon et al., who gave an example of characterizing the organismic diversity of a community by assigning protein-coding genes from the metagenome to taxa (Simon et al., 2009); also, we revisit PhyloPythia (McHardy et al., 2007). Hess et al. perform the amazing feat of obtaining a nearly complete genome sequence of various organisms from the metagenome fragments of a cow’s rumen (Hess, 2011); Mackelprang et al. obtained a similar result from permafrost soil, obtaining the sequence of a novel methanogen from permafrost soil (Mackelprang et al., 2011). Rinke et al. provide results from single-cell genome sequencing of various phyla that had never previously been sequenced; this provided evidence for four previously unknown superphyla (Rinke et al., 2013). Just to show that we care about the gene-based discovery of phylogenetic supergroups in non-bacteria, we provide the discovery of superorders of mammals (Bininda-Emonds et al., 2007).

Nov. 14

21. Metagenomic approaches to finding out what unidentified genes do (ecological annotation)

Here are the references for the metagenome projects discussed in class f (Wu et al., 2009; Turnbaugh et al., 2007; Gilbert et al., 2010; 10K, 2009; Davies et al., 2012; Tyson et al., 2004). Knight et al. plea for a new standard of coverage of environmental data in metagenomics studies (Knight et al., 2012). Plewniak give a nice old-style example of how we can identify the genes responsible for adaptation to a given geochemical stressor, if we already know the genes (Plewniak et al., 2013). Inskeep et al. give a nice example of extremely different sets of geochemical stressors across habitats in a metagenome study (Inskeep et al., 2010). Biddle et al. give an example of less extreme variation among environments, where the same phyla are found everywhere, possibly a good source of ecological annotation (Biddle et al., 2011). Mackay et al. describe the Drosophila melanogaster genetic reference panel, which consists of the genome sequences of 168 inbred lines derived from a single natural population; this is being used to determine the genes responsible for each of many physiological, behavioral, and ecological traits (Mackay et al., 2012). (This is something I learned about on my visit to SUNY Binghamton after I gave this lecture, and so it wasn’t included in the lecture.)

Nov. 19

22. The human microbiome: types of communities across humans, functional screening for novel genes, antibiotic holocausts and health consequences

Our story today begins with the emergence of the germ theory of disease, and an attitude both within households and in the public health establishment that the only good germ is a dead germ; I recommend The Gospel of Germs by Nancy Tomes as a great narrative of this period, from the 1870’s mostly until the antibiotic revolution of the 1940’s (Tomes, 1998). Zimmer and Velasquez-Manoff have recently written popular accounts of the importance of our gut microbes in human health (Pollan, 2013; Velasquez-Manoff, 2013) http://www.nytimes.com/2013/05/19/magazine/say-hello-to-the-100-trillion-bacteria-that-make-up-your-microbiome.html?ref=magazine . The most direct repercussion of the germ-as-enemy approach, leading to overuse of antibiotics, has been the emergence of antibiotic resistance. Forslund et al. present data on the prevalence of antibiotic resistance in different countries, and the relationship between use of antibiotics for animal agriculture and resistance in the human gut microbiome (Forslund et al., 2013). More

Page 10: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

recently, we have reached an appreciation for the beneficial qualities of our gut bacteria, and Khosravi and Mazmanian describe the disease-fighting importance of our resident bacteria (Khosravi and Mazmanian, 2013). Pérez-Cobas describe the lasting effect of an antibiotic regimen on the composition of an individual’s gut microbiome (Perez-Cobas et al., 2013). Liping Zhao presents a proposal for a research field where we use various bioinformatic approaches to determine the organismal changes correlated with obesity and leanness, and then perform experiments to test the effects of the implicated bacteria (Zhao, 2013). Wu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated with a carbohydrate diet, and another dominated by Bacteroides and associated with a diet high in fat and proteins; they also show that the microbiome can be changed in the short-term but that it probably takes a long time to fully change a human’s gut microbiome (Wu et al., 2011). Lozupone and Knight have developed a very useful algorithm called Unifrac for clustering bacterial communities by their phylogenetic differences; it is described in a couple of articles (Lozupone et al., 2006; Lozupone and Knight, 2005). Muegge et al. find a functional pattern to the differences in microbiomes of mammalian herbivores vs. carnivores; they find such interesting things as carnivores tending to have microbiomes with lots of amino acid degradation enzymes, while herbivores tend to have lots of amino acid biosynthesis enzymes, which makes sense when you think about it. They also make the case that the microbiomes of human vegans tend to look more like those of mammalian herbivores, while microbiomes of human meat-eaters tend to look like those of mammalian carnivores (Muegge et al., 2011). A mouse study by Zhang et al. (including Liping Zhao) shows that the microbiome of a mouse is rapidly changed with the onset of a high-fat diet, and changes back quickly with resumption of a low-fat diet; they also identify key phylotypes that change rapidly with the change in diet (Zhang et al., 2012). The next step in Liping Zhao’s paradigm is to test each of these taxa for its effect on weight by introducing it into gnotobiotic mice; I supply an example with a previously suspected effect of the bacterium on reducing inflammation (Sokol et al., 2008).

Nov. 21

23. Baseball, biology, and big data

I have written a couple of pieces on Sandy Koufax’s perfect game, and what it taught me about using our imaginations better to have a fuller and more useful data set (Cohan, 2011b; Cohan, 2011c). I also wrote an article on how Big Data approaches can be used better in biology, in homage to Moneyball (Cohan, 2012). Lozupone and Knight wrote their break-out piece on Unifrac, showing that changes in adaptations to salinity were the most difficult transitions in bacterial history; they also lamented that the resolution of the environmental data was such that they could not also investigate the difficulty of more subtle changes in salinity adaptation (Lozupone and Knight, 2007). The disappointment of the missing data led to various conferences on what environmental parameters (and sequencing and assembly tools!) we should be recording when we spend millions of dollars on genome and metagenome sequencing (Field et al., 2008). David Toomey writes about how we see only what we expect to see. This was exemplified by microbiologists’ disinterest in exploring what life may exist in Yellowstone’s hot springs, owing to their

Page 11: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

“knowledge” that life couldn’t possibly exist at such high temperatures (Toomey, 2013), p. 12-13. See also David Brock’s account of his discovery (Brock, 1995) and Thomas Kuhn’s account of our limitations toward discovery (Kuhn, 1996), p. 63-64. Hurwirtz and Sullivan have organized the unknown diversity among marine viral proteins by clustering them, and then trying to find out what ocean properties each cluster is associated with (Hurwitz and Sullivan, 2013).

Nov. 26

24. (lecture cancelled)

Dec. 3

25. Molecular approaches for identifying microbial diversity in natural communities—AdaptML and Ecotype Simulation

Mallet presents evidence that speciation is always easy, even for the highly sexual animals and plants (Mallet, 2008). My “Are species cohesive?” article presents all the ways that speciation may be even easier in bacteria, and the ways that we can use molecular and bioinformatic techniques to find the rate of speciation in bacteria (Cohan, 2011a).

10K, G. (2009). Genome 10K: A proposal to obtain whole-genome sequence for 10,000 vertebrate species. Journal of Heredity 100: 659-674.

Anisimova, M. &Gascuel, O. (2006). Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst Biol 55(4): 539-552.

Arendt, J. &Reznick, D. (2008). Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends Ecol Evol 23(1): 26-32.

Bell, T., Newman, J. A., Silverman, B. W., Turner, S. L. &Lilley, A. K. (2005). The contribution of species richness and composition to bacterial services. Nature 436(7054): 1157-1160.

Bhaya, D., Grossman, A. R., Steunou, A. S., Khuri, N., Cohan, F. M., Hamamura, N., Melendrez, M. C., Bateson, M. M., Ward, D. M. &Heidelberg, J. F. (2007). Population level functional diversity in a microbial community revealed by comparative genomic and metagenomic analyses. ISME J 1(8): 703-713.

Biddle, J. F., White, J. R., Teske, A. P. &House, C. H. (2011). Metagenomics of the subsurface Brazos-Trinity Basin (IODP site 1320): comparison with other sediment and pyrosequenced metagenomes. ISME J 5(6): 1038-1047.

Bininda-Emonds, O. R., Cardillo, M., Jones, K. E., MacPhee, R. D., Beck, R. M., Grenyer, R., Price, S. A., Vos, R. A., Gittleman, J. L. &Purvis, A. (2007). The delayed rise of present-day mammals. Nature 446(7135): 507-512.

Blount, Z. D., Borland, C. Z. &Lenski, R. E. (2008). Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proc Natl Acad Sci U S A 105(23): 7899-7906.

Bos, D. H. &Posada, D. (2005). Using models of nucleotide evolution to build phylogenetic trees. Dev Comp Immunol 29(3): 211-227.

Brenner, S. E., Hubbard, T., Murzin, A. &Chothia, C. (1995). Gene duplications in H. influenzae. Nature 378(6553): 140.

Brock, T. D. (1995). The road to Yellowstone--and beyond. Annu Rev Microbiol 49: 1-28.

Page 12: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Cecchini, D. A., Laville, E., Laguerre, S., Robe, P., Leclerc, M., Dore, J., Henrissat, B., Remaud-Simeon, M., Monsan, P. &Potocki-Veronese, G. (2013). Functional metagenomics reveals novel pathways of prebiotic breakdown by human gut bacteria. PLoS One 8(9): e72766.

Chin, C. S., Alexander, D. H., Marks, P., Klammer, A. A., Drake, J., Heiner, C., Clum, A., Copeland, A., Huddleston, J., Eichler, E. E., Turner, S. W. &Korlach, J. (2013). Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10(6): 563-569.

Choi, S. C., Rasmussen, M. D., Hubisz, M. J., Gronau, I., Stanhope, M. J. &Siepel, A. (2012). Replacing and additive horizontal gene transfer in streptococcus. Mol Biol Evol 29(11): 3309-3320.

Clark, A. G., Glanowski, S., Nielsen, R., Thomas, P. D., Kejariwal, A., Todd, M. A., Tanenbaum, D. M., Civello, D., Lu, F., Murphy, B., Ferriera, S., Wang, G., Zheng, X., White, T. J., Sninsky, J. J., Adams, M. D. &Cargill, M. (2003). Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302(5652): 1960-1963.

Cohan, F. M. (1984). Genetic-Divergence under Uniform Selection .1. Similarity among Populations of Drosophila-Melanogaster in Their Responses to Artificial Selection for Modifiers of Cid. Evolution 38(1): 55-71.

Cohan, F. M. (2005).Periodic selection and ecological diversity in bacteria. In Selective Sweep, 78-93 (Ed D. Nurminsky). Georgetown, Texas: Landes Bioscience.

Cohan, F. M. (2010). Synthetic biology: now that we're creators, what should we create? Curr Biol 20(16): R675-677.

Cohan, F. M. (2011a).Are species cohesive?--A view from bacteriology. In Bacterial Population Genetics: A Tribute to Thomas S. Whittam, 43-65 (Eds S. Walk and P. Feng). Washington, DC: American Society for Microbiology Press.

Cohan, F. M. (2011b).A more perfect numbers game. In Los Angeles Times.Cohan, F. M. (2011c). Q&A: Frederick Cohan. Current Biology 21(11): R412-R414.Cohan, F. M. (2012). Science needs more Moneyball. American Scientist 100(3): 182-185.Cohan, F. M. (2013).Species. In Brenner's Encyclopedia of Genetics, Second Edition, 506-511 (Eds S.

Maloy and K. Hughes). Amsterdam: Elsevier.Cohan, F. M. &Hoffmann, A. A. (1989). Uniform selection as a diversifying force in evolution: evidence

from Drosophila. Am Nat 134: 613-637.Cohan, F. M. &Kopac, S. M. (2011). Microbial genomics: E. coli relatives out of doors and out of body.

Curr Biol 21(15): R587-589.Cohan, F. M. &Perry, E. B. (2007). A systematics for discovering the fundamental units of bacterial

diversity. Current Biology 17: R373-R386.Connor, N., Sikorski, J., Rooney, A. P., Kopac, S., Koeppel, A. F., Burger, A., Cole, S. G., Perry, E. B.,

Krizanc, D., Field, N. C., Slaton, M. &Cohan, F. M. (2010). The ecology of speciation in Bacillus. Applied and Environmental Microbiology 76: 1349-1358.

Davies, N., Field, D. &Genomic Observatories, N. (2012). Sequencing data: A genomic network to monitor Earth. Nature 481(7380): 145.

Degnan, J. H. &Rosenberg, N. A. (2006). Discordance of species trees with their most likely gene trees. PLoS Genet 2(5): e68.

Delcher, A. L., Kasif, S., Fleischmann, R. D., Peterson, J., White, O. &Salzberg, S. L. (1999). Alignment of whole genomes. Nucleic Acids Res 27(11): 2369-2376.

DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P. &Andersen, G. L. (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72(7): 5069-5072.

Dettman, J. R., Rodrigue, N., Melnyk, A. H., Wong, A., Bailey, S. F. &Kassen, R. (2012). Evolutionary insight from whole-genome sequencing of experimentally evolved microbes. Mol Ecol 21(9): 2058-2077.

Page 13: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Donoghue, M. J. (1989). Phylogenies and the analysis of evolutionary sequences, with examples from seed plants. Evolution 43: 1137-1156.

Eddy, S. R. (2004). What is dynamic programming? Nat Biotechnol 22(7): 909-910.Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Nucleic Acids Research 32: 1792-1797.Ferea, T. L., Botstein, D., Brown, P. O. &Rosenzweig, R. F. (1999). Systematic changes in gene expression

patterns following adaptive evolution in yeast. Proc Natl Acad Sci U S A 96(17): 9721-9726.Field, D., Garrity, G., Gray, T., Morrison, N., Selengut, J., Sterk, P., Tatusova, T., Thomson, N., Allen, M. J.,

Angiuoli, S. V., Ashburner, M., Axelrod, N., Baldauf, S., Ballard, S., Boore, J., Cochrane, G., Cole, J., Dawyndt, P., De Vos, P., DePamphilis, C., Edwards, R., Faruque, N., Feldman, R., Gilbert, J., Gilna, P., Glockner, F. O., Goldstein, P., Guralnick, R., Haft, D., Hancock, D., Hermjakob, H., Hertz-Fowler, C., Hugenholtz, P., Joint, I., Kagan, L., Kane, M., Kennedy, J., Kowalchuk, G., Kottmann, R., Kolker, E., Kravitz, S., Kyrpides, N., Leebens-Mack, J., Lewis, S. E., Li, K., Lister, A. L., Lord, P., Maltsev, N., Markowitz, V., Martiny, J., Methe, B., Mizrachi, I., Moxon, R., Nelson, K., Parkhill, J., Proctor, L., White, O., Sansone, S. A., Spiers, A., Stevens, R., Swift, P., Taylor, C., Tateno, Y., Tett, A., Turner, S., Ussery, D., Vaughan, B., Ward, N., Whetzel, T., San Gil, I., Wilson, G. &Wipat, A. (2008). The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26(5): 541-547.

Fierer, N., Hamady, M., Lauber, C. L. &Knight, R. (2008). The influence of sex, handedness, and washing on the diversity of hand surface bacteria. Proc Natl Acad Sci U S A 105(46): 17994-17999.

Flicek, P. &Birney, E. (2009). Sense from sequence reads: methods for alignment and assembly. Nat Methods 6(11 Suppl): S6-S12.

Fong, S. S., Joyce, A. R. &Palsson, B. O. (2005). Parallel adaptive evolution cultures of Escherichia coli lead to convergent growth phenotypes with different gene expression states. Genome Res 15(10): 1365-1372.

Forslund, K., Sunagawa, S., Kultima, J. R., Mende, D. R., Arumugam, M., Typas, A. &Bork, P. (2013). Country-specific antibiotic use practices impact the human gut resistome. Genome Res 23(7): 1163-1169.

Funch, P. &Kristensen, R. (1995). Cycliophora is a new phylum with affinities to Entoprocta and Ectoprocta. Nature 378: 711-714.

Futuyma, D. J. (1998). Evolutionary Biology.Genoways, H. H. &Choate, J. r. (1972). A multivariate analysis of systematic relationships among

populations of the short-tailed shrew (genus Blarina) in Nebraska. Systematic Zoology 21: 106-116.

Gilbert, J. A., Meyer, F., Jansson, J., Gordon, J., Pace, N., Tiedje, J., Ley, R., Fierer, N., Field, D., Kyrpides, N., Glockner, F. O., Klenk, H. P., Wommack, K. E., Glass, E., Docherty, K., Gallery, R., Stevens, R. &Knight, R. (2010). The Earth Microbiome Project: Meeting report of the "1 EMP meeting on sample selection and acquisition" at Argonne National Laboratory October 6 2010. Stand Genomic Sci 3(3): 249-253.

Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. &Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature 457(7232): 1012-1014.

Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W. &Gascuel, O. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3): 307-321.

Hahne, H., Mader, U., Otto, A., Bonn, F., Steil, L., Bremer, E., Hecker, M. &Becher, D. (2010). A comprehensive proteomics and transcriptomics analysis of Bacillus subtilis salt stress adaptation. J Bacteriol 192(3): 870-882.

Page 14: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Hao, W. &Golding, G. B. (2006). The fate of laterally transferred genes: life in the fast lane to adaptation or death. Genome Res 16(5): 636-643.

Harel, D. (2000).Sometimes we just don't know. In computers Ltd.: what they really can't do, 91-117 Oxford: Oxford Univ. Press.

Harvey, P. H. &Pagel, M. D. (1991). The Comparative Method in Evolutionary Biology. Oxford: Oxford University Press.

Herring, C. D., Raghunathan, A., Honisch, C., Patel, T., Applebee, M. K., Joyce, A. R., Albert, T. J., Blattner, F. R., van den Boom, D., Cantor, C. R. &Palsson, B. O. (2006). Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat Genet 38(12): 1406-1412.

Hess, M. (2011). Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331: 463-467.

Huelsenbeck, J. P., Ronquist, F., Nielsen, R. &Bollback, J. P. (2001). Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294(5550): 2310-2314.

Hughes, A. L. (2007). Looking for Darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level. Heredity (Edinb) 99(4): 364-373.

Hurwitz, B. L. &Sullivan, M. B. (2013). The Pacific Ocean virome (POV): a marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology. PLoS One 8(2): e57355.

Inskeep, W. P., Rusch, D. B., Jay, Z. J., Herrgard, M. J., Kozubal, M. A., Richardson, T. H., Macur, R. E., Hamamura, N., Jennings, R., Fouke, B. W., Reysenbach, A. L., Roberto, F., Young, M., Schwartz, A., Boyd, E. S., Badger, J. H., Mathur, E. J., Ortmann, A. C., Bateson, M., Geesey, G. &Frazier, M. (2010). Metagenomes from high-temperature chemotrophic systems reveal geochemical controls on microbial community structure and function. PLoS One 5(3): e9773.

Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S. &Miyata, T. (1989). Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci U S A 86(23): 9355-9359.

Kämpfer, P., Kroppenstedt, R. M. &Dott, W. (1991). A numerical classification of the genera Streptomyces and Streptoverticillium using miniaturized physiological tests. Journal of General Microbiology 137: 1831-1891.

Keeling, P. J. &Palmer, J. D. (2008). Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet 9(8): 605-618.

Khosravi, A. &Mazmanian, S. K. (2013). Disruption of the gut microbiome as a risk factor for microbial infections. Curr Opin Microbiol 16(2): 221-227.

Kim, J. S., Makama, M., Petito, J., Park, N. H., Cohan, F. M. &Dungan, R. S. (2012). Diversity of Bacteria and Archaea in hypersaline sediment from Death Valley National Park, California. MicrobiologyOpen 1(2): 135-148.

Knight, R., Jansson, J., Field, D., Fierer, N., Desai, N., Fuhrman, J. A., Hugenholtz, P., van der Lelie, D., Meyer, F., Stevens, R., Bailey, M. J., Gordon, J. I., Kowalchuk, G. A. &Gilbert, J. A. (2012). Unlocking the potential of metagenomics through replicated experimental design. Nat Biotechnol 30(6): 513-520.

Koeppel, A. F., Wertheim, J. O., Barone, L., Gentile, N., Krizanc, D. &Cohan, F. M. (2013). Speedy speciation in a bacterial microcosm: New species can arise as frequently as adaptations within a species. ISME J 7: 1080-1091.

Konstantinidis, K. T. &Tiedje, J. M. (2005). Towards a genome-based taxonomy for prokaryotes. J Bacteriol 187(18): 6258-6264.

Kopac, S. M. &Cohan, F. M. (2012).Comment on "Population genomics of early events in the ecological differentiation of bacteria". In Science, Vol. 336.

Kuhn, T. (1996). The Structure of Scientific Revolutions. Chicago: University of Chicago.

Page 15: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Larson, G., Dobney, K., Albarella, U., Fang, M., Matisoo-Smith, E., Robins, J., Lowden, S., Finlayson, H., Brand, T., Willerslev, E., Rowley-Conwy, P., Andersson, L. &Cooper, A. (2005). Worldwide phylogeography of wild boar reveals multiple centers of pig domestication. Science 307(5715): 1618-1621.

Lay, C. Y., Mykytczuk, N. C., Yergeau, E., Lamarche-Gagnon, G., Greer, C. W. &Whyte, L. G. (2013). Defining the functional potential and active community members of a sediment microbial community in a high-arctic hypersaline subzero spring. Appl Environ Microbiol 79(12): 3637-3648.

Lozupone, C., Hamady, M. &Knight, R. (2006). UniFrac--an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics 7: 371.

Lozupone, C. &Knight, R. (2005). UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12): 8228-8235.

Lozupone, C. A. &Knight, R. (2007). Global patterns in bacterial diversity. Proc Natl Acad Sci U S A 104(27): 11436-11440.

Luo, C., Walk, S. T., Gordon, D. M., Feldgarden, M., Tiedje, J. M. &Konstantinidis, K. T. (2011). Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species. Proc Natl Acad Sci U S A 108(17): 7200-7205.

Mackay, T. F., Richards, S., Stone, E. A., Barbadilla, A., Ayroles, J. F., Zhu, D., Casillas, S., Han, Y., Magwire, M. M., Cridland, J. M., Richardson, M. F., Anholt, R. R., Barron, M., Bess, C., Blankenburg, K. P., Carbone, M. A., Castellano, D., Chaboub, L., Duncan, L., Harris, Z., Javaid, M., Jayaseelan, J. C., Jhangiani, S. N., Jordan, K. W., Lara, F., Lawrence, F., Lee, S. L., Librado, P., Linheiro, R. S., Lyman, R. F., Mackey, A. J., Munidasa, M., Muzny, D. M., Nazareth, L., Newsham, I., Perales, L., Pu, L. L., Qu, C., Ramia, M., Reid, J. G., Rollmann, S. M., Rozas, J., Saada, N., Turlapati, L., Worley, K. C., Wu, Y. Q., Yamamoto, A., Zhu, Y., Bergman, C. M., Thornton, K. R., Mittelman, D. &Gibbs, R. A. (2012). The Drosophila melanogaster Genetic Reference Panel. Nature 482(7384): 173-178.

Mackelprang, R., Waldrop, M. P., DeAngelis, K. M., David, M. M., Chavarria, K. L., Blazewicz, S. J., Rubin, E. M. &Jansson, J. K. (2011). Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature 480(7377): 368-371.

Mallet, J. (1995). A species definition for the modern synthesis. Trends Ecol. Evol. 10: 294-299.Mallet, J. (2008). Hybridization, ecological races and the nature of species: empirical evidence for the

ease of speciation. Philos Trans R Soc Lond B Biol Sci 363(1506): 2971-2986.McGrayne, S. B. (2011). The theory that would not die: how bayes' rule cracked the enigma code, hunted

down russian submarines, & emerged triumphant from two centuries of controversy. New Haven: Yale.

McHardy, A. C., Martin, H. G., Tsirigos, A., Hugenholtz, P. &Rigoutsos, I. (2007). Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4(1): 63-72.

McMahon, M. D., Guan, C., Handelsman, J. &Thomas, M. G. (2012). Metagenomic analysis of Streptomyces lividans reveals host-dependent functional expression. Appl Environ Microbiol 78(10): 3622-3629.

Merhej, V., Royer-Carenzi, M., Pontarotti, P. &Raoult, D. (2009). Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol Direct 4: 13.

Michener, C. D. &Sokal, R. R. (1057). A Quantitative Approach to a Problem in Classification. Evolution 11: 130-162.

Mikkelsen, T. S., Hillier, L. W. &authors, a. m. o. (2005). Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437(7055): 69-87.

Morrison, D. A. (2009). Why would phylogeneticists ignore computerized sequence alignment? 58: 150-158.

Page 16: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Muegge, B. D., Kuczynski, J., Knights, D., Clemente, J. C., Gonzalez, A., Fontana, L., Henrissat, B., Knight, R. &Gordon, J. I. (2011). Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science 332(6032): 970-974.

Myers, E. W., Sutton, G. G., Smith, H. O., Adams, M. D. &Venter, J. C. (2002). On the sequencing and assembly of the human genome. Proc Natl Acad Sci U S A 99(7): 4145-4146.

Nei, M. &Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3(5): 418-426.

Nosenko, T., Schreiber, F., Adamska, M., Adamski, M., Eitel, M., Hammel, J., Maldonado, M., Muller, W. E., Nickel, M., Schierwater, B., Vacelet, J., Wiens, M. &Worheide, G. (2013). Deep metazoan phylogeny: when different genes tell different stories. Mol Phylogenet Evol 67(1): 223-233.

Pavlidis, P., Zivkovic, D., Stamatakis, A. &Alachiotis, N. (2013). SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol Biol Evol 30(9): 2224-2234.

Penn, O., Privman, E., Ashkenazy, H., Landan, G., Graur, D. &Pupko, T. (2010). GUIDANCE: a web server for assessing alignment confidence scores. Nucleic Acids Res 38(Web Server issue): W23-28.

Perez-Cobas, A. E., Gosalbes, M. J., Friedrichs, A., Knecht, H., Artacho, A., Eismann, K., Otto, W., Rojo, D., Bargiela, R., von Bergen, M., Neulinger, S. C., Daumer, C., Heinsen, F. A., Latorre, A., Barbas, C., Seifert, J., dos Santos, V. M., Ott, S. J., Ferrer, M. &Moya, A. (2013). Gut microbiota disturbance during antibiotic therapy: a multi-omic approach. Gut 62(11): 1591-1601.

Plewniak, F., Koechler, S., Navet, B., Dugat-Bony, E., Bouchez, O., Peyret, P., Seby, F., Battaglia-Brunet, F. &Bertin, P. N. (2013). Metagenomic insights into microbial metabolism affecting arsenic dispersion in Mediterranean marine sediments. Mol Ecol 22(19): 4870-4883.

Pollan, M. (2013).Some of my best friends are germs. In New York TimesNew York.Popa, O., Hazkani-Covo, E., Landan, G., Martin, W. &Dagan, T. (2011). Directed networks reveal genomic

barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Res 21(4): 599-609.

Probert, R. J., Daws, M. I. &Hay, F. R. (2009). Ecological correlates of ex situ seed longevity: a comparative study on 195 species. Annals of Botany 104(1): 57-69.

Richards, C. L., Rosas, U., Banta, J., Bhambhra, N. &Purugganan, M. D. (2012). Genome-wide patterns of Arabidopsis gene expression in nature. PLoS Genet 8(4): e1002662.

Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N. N., Anderson, I. J., Cheng, J. F., Darling, A., Malfatti, S., Swan, B. K., Gies, E. A., Dodsworth, J. A., Hedlund, B. P., Tsiamis, G., Sievert, S. M., Liu, W. T., Eisen, J. A., Hallam, S. J., Kyrpides, N. C., Stepanauskas, R., Rubin, E. M., Hugenholtz, P. &Woyke, T. (2013). Insights into the phylogeny and coding potential of microbial dark matter. Nature 499(7459): 431-437.

Robertson, D. E., Chaplin, J. A., DeSantis, G., Podar, M., Madden, M., Chi, E., Richardson, T., Milan, A., Miller, M., Weiner, D. P., Wong, K., McQuaid, J., Farwell, B., Preston, L. A., Tan, X., Snead, M. A., Keller, M., Mathur, E., Kretz, P. L., Burk, M. J. &Short, J. M. (2004). Exploring nitrilase sequence space for enantioselective catalysis. Appl Environ Microbiol 70(4): 2429-2436.

Ronquist, F. &Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12): 1572-1574.

Saitou, N. &Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4): 406-425.

Schloss, P. D. &Handelsman, J. (2004). Status of the microbial census. Microbiol Mol Biol Rev 68(4): 686-691.

She, X., Jiang, Z., Clark, R. A., Liu, G., Cheng, Z., Tuzun, E., Church, D. M., Sutton, G., Halpern, A. L. &Eichler, E. E. (2004). Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431(7011): 927-930.

Page 17: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Silver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail--but Some Don't. New York: Penguin.

Simon, C., Wiezer, A., Strittmatter, A. W. &Daniel, R. (2009). Phylogenetic diversity and metabolic potential revealed in a glacier ice metagenome. Appl Environ Microbiol 75(23): 7519-7526.

Sogin, M. L., Morrison, H. G., Huber, J. A., Mark Welch, D., Huse, S. M., Neal, P. R., Arrieta, J. M. &Herndl, G. J. (2006). Microbial diversity in the deep sea and the underexplored "rare biosphere". Proc Natl Acad Sci U S A 103(32): 12115-12120.

Sokol, H., Pigneur, B., Watterlot, L., Lakhdari, O., Bermudez-Humaran, L. G., Gratadoux, J. J., Blugeon, S., Bridonneau, C., Furet, J. P., Corthier, G., Grangette, C., Vasquez, N., Pochart, P., Trugnan, G., Thomas, G., Blottiere, H. M., Dore, J., Marteau, P., Seksik, P. &Langella, P. (2008). Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients. Proc Natl Acad Sci U S A 105(43): 16731-16736.

Sommer, M. O., Dantas, G. &Church, G. M. (2009). Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325(5944): 1128-1131.

Sumby, P., Whitney, A. R., Graviss, E. A., DeLeo, F. R. &Musser, J. M. (2006). Genome-wide analysis of group a streptococci reveals a mutation that modulates global phenotype and disease specificity. PLoS Pathog 2(1): e5.

Sumner, J. G., Jarvis, P. D., Fernandez-Sanchez, J., Kaine, B. T., Woodhams, M. D. &Holland, B. R. (2012). Is the general time-reversible model bad for molecular phylogenetics? Syst Biol 61(6): 1069-1074.

Thalmann, O., Shapiro, B., Cui, P., Schuenemann, V. J., Sawyer, S. K., Greenfield, D. L., Germonpre, M. B., Sablin, M. V., Lopez-Giraldez, F., Domingo-Roura, X., Napierala, H., Uerpmann, H. P., Loponte, D. M., Acosta, A. A., Giemsch, L., Schmitz, R. W., Worthington, B., Buikstra, J. E., Druzhkova, A., Graphodatsky, A. S., Ovodov, N. D., Wahlberg, N., Freedman, A. H., Schweizer, R. M., Koepfli, K. P., Leonard, J. A., Meyer, M., Krause, J., Paabo, S., Green, R. E. &Wayne, R. K. (2013). Complete mitochondrial genomes of ancient canids suggest a European origin of domestic dogs. Science 342(6160): 871-874.

Tomes, N. (1998). The Gospel of Germs: Men, Women, and the Microbe in American Life. Cambridge, Mass.: Harvard University Press.

Toomey, D. (2013). Weird Life: The Search for Life that Is Very, Very Different from our Own. New York: Norton.

Touchon, M., Hoede, C., Tenaillon, O., Barbe, V., Baeriswyl, S., Bidet, P., Bingen, E., Bonacorsi, S., Bouchier, C., Bouvet, O., Calteau, A., Chiapello, H., Clermont, O., Cruveiller, S., Danchin, A., Diard, M., Dossat, C., Karoui, M. E., Frapy, E., Garry, L., Ghigo, J. M., Gilles, A. M., Johnson, J., Le Bouguenec, C., Lescat, M., Mangenot, S., Martinez-Jéhanne, V., Matic, I., Nassif, X., Oztas, S., Petit, M. A., Pichon, C., Rouy, Z., Ruf, C. S., Schneider, D., Tourret, J., Vacherie, B., Vallenet, D., Médigue, C., Rocha, E. P. &Denamur, E. (2009). Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5(1): e1000344.

Turnbaugh, P. J., Ley, R. E., Hamady, M., Fraser-Liggett, C. M., Knight, R. &Gordon, J. I. (2007). The human microbiome project. Nature 449(7164): 804-810.

Tyson, G. W., Chapman, J., Hugenholtz, P., Allen, E. E., Ram, R. J., Richardson, P. M., Solovyev, V. V., Rubin, E. M., Rokhsar, D. S. &Banfield, J. F. (2004). Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428(6978): 37-43.

Velasquez-Manoff, M. (2013).A cure for the allergy epidemic? In New York TimesNew York Times.Vos, M. (2011). A species concept for bacteria based on adaptive divergence. Trends Microbiol 19(1): 1-

7.

Page 18: wesfiles.wesleyan.edu€¦  · Web viewWu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated

Vos, M., te Beek, T. A., van Driel, M. A., Huynen, M. A., Eyre-Walker, A. &van Passel, M. W. (2013). ODoSE: a webserver for genome-wide calculation of adaptive divergence in prokaryotes. PLoS One 8(5): e62447.

Waterston, R. H., Lander, E. S. &Sulston, J. E. (2002). On the sequencing of the human genome. Proc Natl Acad Sci U S A 99(6): 3712-3716.

White, M. A., Ane, C., Dewey, C. N., Larget, B. R. &Payseur, B. A. (2009). Fine-scale phylogenetic discordance across the house mouse genome. PLoS Genet 5(11): e1000729.

Williamson, S. H., Hubisz, M. J., Clark, A. G., Payseur, B. A., Bustamante, C. D. &Nielsen, R. (2007). Localizing recent adaptive evolution in the human genome. PLoS Genet 3(6): e90.

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N. N., Kunin, V., Goodwin, L., Wu, M., Tindall, B. J., Hooper, S. D., Pati, A., Lykidis, A., Spring, S., Anderson, I. J., D'Haeseleer, P., Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J. F., Lucas, S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E. M., Kyrpides, N. C., Klenk, H. P. &Eisen, J. A. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462(7276): 1056-1060.

Wu, G. D., Chen, J., Hoffmann, C., Bittinger, K., Chen, Y. Y., Keilbaugh, S. A., Bewtra, M., Knights, D., Walters, W. A., Knight, R., Sinha, R., Gilroy, E., Gupta, K., Baldassano, R., Nessel, L., Li, H., Bushman, F. D. &Lewis, J. D. (2011). Linking long-term dietary patterns with gut microbial enterotypes. Science 334(6052): 105-108.

Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8): 1586-1591.Zhai, W., Nielsen, R., Goldman, N. &Yang, Z. (2012). Looking for Darwin in genomic sequences--validity

and success of statistical methods. Mol Biol Evol 29(10): 2889-2893.Zhang, C., Zhang, M., Pang, X., Zhao, Y., Wang, L. &Zhao, L. (2012). Structural resilience of the gut

microbiota in adult mice under high-fat dietary perturbations. ISME J 6(10): 1848-1857.Zhao, L. (2013). The gut microbiota and obesity: from correlation to causality. Nat Rev Microbiol 11(9):

639-647.Zhong, Y., Jia, Y., Gao, Y., Tian, D., Yang, S. &Zhang, X. (2013). Functional requirements driving the gene

duplication in 12 Drosophila species. BMC Genomics 14: 555.Zmasek, C. M. &Eddy, S. R. (2001). A simple algorithm to infer gene duplication and speciation events on

a gene tree. Bioinformatics 17(9): 821-828.