scott emrich assistant professor, computer science and engineering scientific manager, vectorbase...

25
Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics framework for integrating heterogeneous vector sequence data

Upload: jocelyn-kelley

Post on 28-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

Scott Emrich

Assistant Professor, Computer Science and Engineering

Scientific Manager, VectorBaseUniversity of Notre Dame

A flexible, scalable genomics framework for integrating

heterogeneous vector sequence data

Page 2: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

Assembly required…

Page 3: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

VectorBase is here to help (esp. –OMICs data)

Please see me and/or Dan Lawson (EBI) anytime this meeting

Page 4: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

Anopheles gambiae M & S

Lawnziak, Emrich et al. (2010, Science)

Page 5: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 6: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

Some genomic regions display footprint of strong, recent selection

Lawniczak, Emrich et al. 2010 Science

Page 7: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

A C G T C G T T A C T G CReference:

A C G T C G A T A C T G CSample_1:

A C G T C G T T A T T G CSample_2:

A C G T C G A T A T T G CA C G T C G A T A T T G CA C G T C G A T A C T G CA C G T C G A T A C T G C

A C G T C G T T A T T G CA C G T C G T T A T T G CA C G T C G T T A T T G CA C G T C G T T A T T G C

FlexReseq tool for integrating diverse sequence data

Page 8: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

FlexReseq implementation

Genome Analysis Toolkit (GATK):Map-Reduce framework that allows efficient access to large resequencing data sets

FlexReseq: A module for GATK:Configurable interface allows easy data explorationModular implementation of rules allows for easy extension of software

Saves you from lots of scripting (Perl) code!

McKenna et al., Genome Research, 2010

Page 9: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 10: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 11: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

A malaria use-case for FlexReseq

Samarakoon, Regier, et al., BMC Genomics, 2011

Why are some parasites drug-resistant?

Goal: we want to connect genotype (genome)

to phenotype (drug response)

How did drug-resistance evolve?

Page 12: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

1. Whole genome shotgun

sequencing 2. Reference genome mapping

NCBI Trace Archive [28]

Reference genome

(3D7)

Parental genomes[shotgun libraries]

Progeny genomes[shotgun libraries]

PlasmoDB (v5.4) [27]

Mapped:

SSAHA2

http://www.sanger.

ac.uk

ParentsHB3, Dd2

Progenyrecombinants

SC05 7C126

Shotgun librariesGS-FLX technology

454/Roche

Genetic crossWellems et al.

1990 [24]

Page 13: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 14: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

A more detailed map of P. falciparum

Dd2 HB3Chromosome position

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Chr

omos

om

e

(A) 7C126 (B) SC05

Page 15: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

Association of 2La with clines of aridity in Nigeria…

Modified from Coluzzi et al (1979)

24,000 mosquitoes

194 sampling localities

Page 16: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

High-throughput sequencing

• Data from Besansky lab• Illumina Genome

Analyzer• 4 population pools

(S-form)• SHRiMP alignment• BWA works also

C. Cheng et al, unpublished

Page 17: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 18: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

Differential mapping biases do exist

Page 19: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 20: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

Population haplotyping

Page 21: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

In situ error isolation

Has been shown to be important in ancient DNA-based ecology

Page 22: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 23: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 24: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics
Page 25: Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics

Thanks to…

VectorBase (NIH/NIAID)• Dr. Nora Besansky (ND)• Dr. Frank Collins (ND)• Rory Carmichael, Andrew

Shehan, Nate Konopinski, Dave Campbell (ND), others…

Notre Dame Bioinformatics Lab, Summer 2010

Anopheles genome cluster groupi5KArthropod Genomics Consortium steering committee