3 - 2 - web interview_ philip hugenholtz

Upload: rusted

Post on 04-Oct-2015

215 views

Category:

Documents


0 download

DESCRIPTION

interview

TRANSCRIPT

What I'm very excited by isthe advent of population genomics.Improvement in binning techniquescoupled with deeper sequencing.Which allows you to pull back,pull out, high-quality andnear-complete genomes for,uncultured organisms.And so the binning method that has,which is starting to get more tractionis called differential coverage binning.And this is based on the ideathat if you look at a set ofrelated micro metagenomes.So for instance, a time series ora spatial series or even using differentDNA extraction methods on the same sample.You have the same populations [COUGH] butthey're present in differentrelative abundance.And you can use that pattern ofrelative abundance as a signature.So you get your you do your assembly andyou get back anonymousfragments of genomes.And if you look at this coverage patternfor each of those anonymous fragments.You can bend them togetherby virtue of their coverage.And that method actuallyworks really well.And so what's really excitingme at the moment is,is, on two fronts using that technology.On the evolutionary front we can notmake genome trees using those ones.And now we can actually see, get a,get a very high resolutionmap of the microbial tree.And these trees are more,robust than sixteen ESE trees.And I'll, and so, so my goal at the momentis to replace the 16s-based, phylogeny.And the taxonomy derived from that,with genome-based phylogeny.And so, at the moment we've gota genome tree database that'sgot about 12,000 genomes in it.Of which about two and half thousandare these population genomes.So my prediction is that, two or threeyears from now, when you go to the public.Database's you'll find that the dominantform of Giraffe genomes willbe these population genomes.Because every study of everyhabitat produces you know,usually on the order ofdozens of these genomes.And we have been, and has withother stuff, just, not just us butother peo, other researchers as well.Developing tools fortaking those, checking the,the quality of those genomes.So we have ways of checking to see howcomplete, and, or contaminated they are.And ways of then quickly pipingthem into genome trees, sowe've spent some time on that.And so I'm very excited by that.because you know I have a,I have an obsessive compulsive disorderwhen it comes to classifying lifeforms.And so this, this very much meets thatrequirement of my personality to do that,you know, in a robust way.So I don't feel like I'mgoing around in circles.But the other, the other application forbeing able to pull outhigh quality population genomesfrom environmental samples.Is now you you can do your ecologicalanalysis much more robustly.So, you know, when we firststarted meta genomics we were,it became apparent that forcomplex communities.We were kind of stuck in not being ableto pull out the component populations.So we do things like do genecentricanalysis where we look atrelative abundance of gene families.Rather than do it from an organismalcontext, which has been fine.But the problem is that, you don't,you don't understand,you don't know who's doing what function.So you're getting a sort of a globaloverview of community function.So with the population genomes,you can, in many cases,you can pull out the major players froma given ecosystem, and now you can see.Which organisms are formingwhich functions, andyou can work out the trafficinteraction networks.So that's very exciting forecology because that provides a reallya solid foundation forunderstanding our ecosystem.All right, so Green Genes,was started in the, early to mid 2000s.And the main, developer is Todd Dissentis,he was the original developer.And, he knew that I, was curatingsixteen ESE sequences in order toget taxonomy based on phylogeny.Which obviously the waywe should all be doing itbecause phylogeny is a natural groupingof organisms and so we want to base.Classification which is a humanconstruction, natural classification.So that's in that's in the goal.And he developed the greengenes database as a vehicle forbeing able to pull inthe public sequences.And then annotate themwith all the metadata.And then I have been the main curatorof the database since its inception.And my job is to go through,and this is a crazy job.And only a crazy person would do it.There's a couple of crazy people onthe planet that do this kind of thing.Where you go through andyou look at the structure of the tree.And, and ideally if you have some ideaof how robust the tree is, and you.Reconcile that phylogeny with,what's the currently acceptablenomenclature for taxonomy.And so there are good resources for,you know, nomenclature people.There's a, there's a committee whichdecides on the names of organism.And then the higher ranks.And what you find when you do that.And there are numerousinstances where the taxonomy.Doesn't match the phylogeny.So then it's a process oftrying to reconcile that.And then another majorissue is that because somuch of the diversity is notrepresented by cultured organisms.There's big squares of the tree,off the phylogenic tree,that has no classification at all.So another part of green genes is to,is to give some form of classification tothese uncharacterized part of the tree.The main programmer is Daniel McDonald.to, by the very generous hosting of Rob,Rob Nyatt.And so he's, he's been sosupportive of the, of green genes.And others still involved as well.My take on the situation is thatwith whole genome based biologeny,worried about whether 60ness was in about 2001.So, we're not even that far off the pace.I think, another 10 years from now,the genome, tree based biologeny,taxonomy will be.with, at about the level of numbersequences that we have with 60 nets.So I predict we're going to go fromsomewhere in the order of ten to20,000 sequences now.To about half a million genomesin ten years from that.And then that we should havea very nice comprehensive courage.Or the tree of life, in a taxonomy that'snot compromised by chimeric artifacts.Hallelujah!And then I, then I don't know, you know,part of the fun is the journey.I hope that this journeywill never be over,of course, because there's alwaysgoing to be more diversity to discover.But that will be a far more.Solid basis for the taxonomy.So, during curation of the dreamteam's database I noticed.And not only me, other people noticed.That there was quite a large clusterof environmental sequences that weregrouping with the cyanobacteria.And these sequences were coming fromhabitats that weren't exposed to sunlight.So, the nagging question in the back of,your mind would be are thesenon-photosynthetic cyanobacteria?The dogma in microbiology fordecades has been that allcyanobacteria are photosynthetic.So, it was a very,it was an attractive target.So, we started to make primers andprobes that would,target the, these basal or,cyanobacteria which we nicknamed.Darcy, short for dark cyanobacteria.So, Darcy was the nickname.And other groups,Ruth Layers group was also interested, andthey were, in parallel looking for them.And we ended up using these populationgenomic methods to recover the genomes.And as it turned out,they, they fell right out.So, we got very, we got,good quality Darcy genomes from a rangeof habitats including the koala gut.From a bioreactor, and from a full-scaleindustrial granular sludge.And, we looked in those genomes andsure enough,they had no photosynthetic apparatus.And if you do the geometry itwas a very robust clustering ofthat group with the photosyntheticside of bacteria.So, with taxonomy we wanted tocall them cyanobacteria, andthey met a huge amount of resistance.So anytime that you challenge a dogma,you're going to meet resistance.And sowhat they ended up doing was classifying.That group as a system.fo, [COUGH] followed the cyanobacteriacalled the Melainabacteria which isGreek a greek nymph, dark nymph.And then that was a lotless controversial.Because now you're not, you're stillmanaging to maintain the dogma that allcyanobacteria, for synthetic.Now is some ways it'sa semantic argument right?because taxonomys human made.But the point is that this group is,reproducible, reproducibly modified itwith the sign of bacteria, so they are.The last common ancestorbefore the photosynthet,the introduction ofthe photosynthetic apparatus.Which must have occurredafter the divisions.So, we should be able to learnsomething about the ancestry ofphotosynthesis by studying this,this star group.Now the power,the power of a name, though.This is the interesting thing.The original paper callinghim a sister phylum ofMelainabacteria sort of wentwith not much fan theorem.It was kind of a, you know, another,it was a, it was a cool study.But it was another candidate forphylum for which we now have genomes.And that's becoming more regularnow with the new techniques.But because it didn't makeany controversial claims,that they were assigned a bacteria,they didn't get much attention.We got a fair bit of attention forour paper for calling them cyanobacteria.So you can see the power of the name,and that's something to be remembered.Because people say, well,taxonomy is such a dry discipline, butreally this power unites.