committee_meeting_1031

38
The Story of My Research developing a bottom-up computational approach to investigate microbial diversity Qingpeng Zhang Department of Computer Science and Engineering Michigan State University Supervisor: Dr. Titus Brown

Upload: qingpeng

Post on 14-Jun-2015

352 views

Category:

Education


0 download

DESCRIPTION

My committee meeting slides on Oct 31st, 2014.

TRANSCRIPT

Page 1: committee_meeting_1031

The Story of My Research

developing a bottom-up computational approach to investigate microbial diversity

Qingpeng Zhang Department of Computer Science and Engineering

Michigan State University Supervisor: Dr. Titus Brown

Page 2: committee_meeting_1031

The Story of My Research

developing a bottom-up computational approach to investigate microbial diversity

Qingpeng Zhang Department of Computer Science and Engineering

Michigan State University Supervisor: Dr. Titus Brown

odyssey?

Page 3: committee_meeting_1031

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

2008

2009

2010

2011

2012

2013

2014

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

developing a bottom-up computational approach to investigate microbial diversity

Page 4: committee_meeting_1031

2008: metagenomics

Page 5: committee_meeting_1031

2008: metagenomics

“Big Data!”

Page 6: committee_meeting_1031

Microbial diversity

similarity-based composition-based

binning/annotation

assemblyreference

2009: microbial diversity

Page 7: committee_meeting_1031

Microbial diversity

similarity-based composition-based

binning/annotation

assemblyreference

2009: microbial diversity

How many stuffs are there in the sample? - alpha diversity How different are the samples? - beta diversity

Page 8: committee_meeting_1031

Microbial diversity

similarity-based composition-based

binning/annotation

assemblyreference

2009: microbial diversity

"Nothing works, everything sucks."

Page 9: committee_meeting_1031

Microbial diversity

similarity-based composition-based

binning/annotation

assemblyreference

2009: microbial diversity

NO!

Page 10: committee_meeting_1031

2009: k-mer counting

Page 11: committee_meeting_1031

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

Page 12: committee_meeting_1031

2010 -now: GPGC

How many stuffs are there in the sample? - alpha diversity How does agricultural soil differ from native soil? - beta diversity

Page 13: committee_meeting_1031

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

Page 14: committee_meeting_1031

2010 -now: khmer

Page 15: committee_meeting_1031

2010 -now: khmer

Page 16: committee_meeting_1031

2010 -now: khmer

• My contributions: • algorithm design/analysis, exploring the mathematics behind, the choice of optimal

parameters• contributing codes, including unique k-mers counting, overlap k-mer counting, optimal

parameter choice, others related to my specific research project.• benchmarking, testing, actually using it.• exploration of applications like error trimming, filter low abundance reads, digital

normalization, etc. suggestion on features• work on the khmer manuscript

Page 17: committee_meeting_1031

2010 -now: khmer

• My contributions: • algorithm design/analysis, exploring the mathematics behind, the choice of optimal

parameters• contributing codes, including unique k-mers counting, overlap k-mer counting, optimal

parameter choice, others related to my specific research project.• benchmarking, testing, actually using it.• exploration of applications like error trimming, filter low abundance reads, digital

normalization, etc. suggestion on features• work on the khmer manuscript

Page 18: committee_meeting_1031

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

Page 19: committee_meeting_1031

2010 -2012: diversity analysis on k-mer level

Page 20: committee_meeting_1031

2010 -2012: diversity analysis on k-mer level

Page 21: committee_meeting_1031

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

Page 22: committee_meeting_1031

2011-2012: diginorm

median k-mer frequency to represent the sequencing coverage of the read

useful for diversity analysis

removing redundant reads useful for assembly

Digital normalization

Page 23: committee_meeting_1031

2011-2012: diginorm

median k-mer frequency to represent the sequencing coverage of the read

useful for diversity analysis

removing redundant reads useful for assembly

Digital normalization

Page 24: committee_meeting_1031

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

Page 25: committee_meeting_1031

2012-2013 symbionts

My contributions: • diginorm/assembly/binning/

annotation • genome completeness estimation

• 94% complete Rs1 • 66-89% complete Rs2

• some transcriptome analysis • Other bioinformatics support

Page 26: committee_meeting_1031

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbionts

diversity analysis on

read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

Page 27: committee_meeting_1031

2012 -now: diversity analysis on read level

Page 28: committee_meeting_1031

2012 -now: diversity analysis on read level

IGS(informative genomic segment) can represent

the novel information of a genome

We can use all the data, not only the data we

understand!

Page 29: committee_meeting_1031

AAABABCDAABC

ABCEFGHIAFGH

AAAB

AABC

ABCD ABCEFGHI AFGH

Page 30: committee_meeting_1031

AAABABCDAABC

ABCEFGHIAFGH

AAAB

AABC

ABCD ABCEFGHI AFGH

Page 31: committee_meeting_1031

Improve the pipeline

khmer diginorm error correction

Page 32: committee_meeting_1031

Sorcerer II Global Ocean Sampling Expedition

Page 33: committee_meeting_1031
Page 34: committee_meeting_1031
Page 35: committee_meeting_1031

2010 -now: GPGC

Page 36: committee_meeting_1031

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

Page 37: committee_meeting_1031

37

Future work

• Finish the IGS based diversity analysis paper • Refine pipeline/adjust statistical method to fit IGSs • More real data sets

• MetaHIT(Metagenomics of the Human Intestinal Tract) (working..) • HMP (Human Microbiome Project) (working..) • GPGC(Soil) (working..) • Ballast water virome (working..)

• Finish a review of the methods and applications of k-mer counting in bioinformatics (will also be part of my dissertation)

• Expand the application of IGS • sequencing depth/effort estimation, genome size estimation • reads binning/classification based on coverage profile across samples • relate IGS to phylogenetic info and function • extract IGS(reads) according different coverage profile (shared by all

Page 38: committee_meeting_1031

Acknowledgement

● Dr. Titus Brown

● Lab members of GED

● Elijah Lowe

● Jiarong Guo

● Camille Scott

● Michael Crusoe

● Luiz Irber

● Dr. Sherine Awad

● Former members of GED

● Dr. Adina Howe

● Eric McDonald

● Dr. Jason Pell

● Dr. Likit Preeyanon

● RDP

● Dr. Jim Cole

● Jordan Fish