scott emrich assistant professor, computer science and engineering scientific manager, vectorbase...
TRANSCRIPT
Scott Emrich
Assistant Professor, Computer Science and Engineering
Scientific Manager, VectorBaseUniversity of Notre Dame
A flexible, scalable genomics framework for integrating
heterogeneous vector sequence data
Assembly required…
VectorBase is here to help (esp. –OMICs data)
Please see me and/or Dan Lawson (EBI) anytime this meeting
Anopheles gambiae M & S
Lawnziak, Emrich et al. (2010, Science)
Some genomic regions display footprint of strong, recent selection
Lawniczak, Emrich et al. 2010 Science
A C G T C G T T A C T G CReference:
A C G T C G A T A C T G CSample_1:
A C G T C G T T A T T G CSample_2:
A C G T C G A T A T T G CA C G T C G A T A T T G CA C G T C G A T A C T G CA C G T C G A T A C T G C
A C G T C G T T A T T G CA C G T C G T T A T T G CA C G T C G T T A T T G CA C G T C G T T A T T G C
FlexReseq tool for integrating diverse sequence data
FlexReseq implementation
Genome Analysis Toolkit (GATK):Map-Reduce framework that allows efficient access to large resequencing data sets
FlexReseq: A module for GATK:Configurable interface allows easy data explorationModular implementation of rules allows for easy extension of software
Saves you from lots of scripting (Perl) code!
McKenna et al., Genome Research, 2010
A malaria use-case for FlexReseq
Samarakoon, Regier, et al., BMC Genomics, 2011
Why are some parasites drug-resistant?
Goal: we want to connect genotype (genome)
to phenotype (drug response)
How did drug-resistance evolve?
1. Whole genome shotgun
sequencing 2. Reference genome mapping
NCBI Trace Archive [28]
Reference genome
(3D7)
Parental genomes[shotgun libraries]
Progeny genomes[shotgun libraries]
PlasmoDB (v5.4) [27]
Mapped:
SSAHA2
http://www.sanger.
ac.uk
ParentsHB3, Dd2
Progenyrecombinants
SC05 7C126
Shotgun librariesGS-FLX technology
454/Roche
Genetic crossWellems et al.
1990 [24]
A more detailed map of P. falciparum
Dd2 HB3Chromosome position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Chr
omos
om
e
(A) 7C126 (B) SC05
Association of 2La with clines of aridity in Nigeria…
Modified from Coluzzi et al (1979)
24,000 mosquitoes
194 sampling localities
High-throughput sequencing
• Data from Besansky lab• Illumina Genome
Analyzer• 4 population pools
(S-form)• SHRiMP alignment• BWA works also
C. Cheng et al, unpublished
Differential mapping biases do exist
Population haplotyping
In situ error isolation
Has been shown to be important in ancient DNA-based ecology
Thanks to…
VectorBase (NIH/NIAID)• Dr. Nora Besansky (ND)• Dr. Frank Collins (ND)• Rory Carmichael, Andrew
Shehan, Nate Konopinski, Dave Campbell (ND), others…
Notre Dame Bioinformatics Lab, Summer 2010
Anopheles genome cluster groupi5KArthropod Genomics Consortium steering committee