committee_meeting_1031

Post on 14-Jun-2015

352 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

My committee meeting slides on Oct 31st, 2014.

TRANSCRIPT

The Story of My Research

developing a bottom-up computational approach to investigate microbial diversity

Qingpeng Zhang Department of Computer Science and Engineering

Michigan State University Supervisor: Dr. Titus Brown

The Story of My Research

developing a bottom-up computational approach to investigate microbial diversity

Qingpeng Zhang Department of Computer Science and Engineering

Michigan State University Supervisor: Dr. Titus Brown

odyssey?

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

2008

2009

2010

2011

2012

2013

2014

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

developing a bottom-up computational approach to investigate microbial diversity

2008: metagenomics

2008: metagenomics

“Big Data!”

Microbial diversity

similarity-based composition-based

binning/annotation

assemblyreference

2009: microbial diversity

Microbial diversity

similarity-based composition-based

binning/annotation

assemblyreference

2009: microbial diversity

How many stuffs are there in the sample? - alpha diversity How different are the samples? - beta diversity

Microbial diversity

similarity-based composition-based

binning/annotation

assemblyreference

2009: microbial diversity

"Nothing works, everything sucks."

Microbial diversity

similarity-based composition-based

binning/annotation

assemblyreference

2009: microbial diversity

NO!

2009: k-mer counting

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

2010 -now: GPGC

How many stuffs are there in the sample? - alpha diversity How does agricultural soil differ from native soil? - beta diversity

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

2010 -now: khmer

2010 -now: khmer

2010 -now: khmer

• My contributions: • algorithm design/analysis, exploring the mathematics behind, the choice of optimal

parameters• contributing codes, including unique k-mers counting, overlap k-mer counting, optimal

parameter choice, others related to my specific research project.• benchmarking, testing, actually using it.• exploration of applications like error trimming, filter low abundance reads, digital

normalization, etc. suggestion on features• work on the khmer manuscript

2010 -now: khmer

• My contributions: • algorithm design/analysis, exploring the mathematics behind, the choice of optimal

parameters• contributing codes, including unique k-mers counting, overlap k-mer counting, optimal

parameter choice, others related to my specific research project.• benchmarking, testing, actually using it.• exploration of applications like error trimming, filter low abundance reads, digital

normalization, etc. suggestion on features• work on the khmer manuscript

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

2010 -2012: diversity analysis on k-mer level

2010 -2012: diversity analysis on k-mer level

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

2011-2012: diginorm

median k-mer frequency to represent the sequencing coverage of the read

useful for diversity analysis

removing redundant reads useful for assembly

Digital normalization

2011-2012: diginorm

median k-mer frequency to represent the sequencing coverage of the read

useful for diversity analysis

removing redundant reads useful for assembly

Digital normalization

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

2012-2013 symbionts

My contributions: • diginorm/assembly/binning/

annotation • genome completeness estimation

• 94% complete Rs1 • 66-89% complete Rs2

• some transcriptome analysis • Other bioinformatics support

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbionts

diversity analysis on

read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

2012 -now: diversity analysis on read level

2012 -now: diversity analysis on read level

IGS(informative genomic segment) can represent

the novel information of a genome

We can use all the data, not only the data we

understand!

AAABABCDAABC

ABCEFGHIAFGH

AAAB

AABC

ABCD ABCEFGHI AFGH

AAABABCDAABC

ABCEFGHIAFGH

AAAB

AABC

ABCD ABCEFGHI AFGH

Improve the pipeline

khmer diginorm error correction

Sorcerer II Global Ocean Sampling Expedition

2010 -now: GPGC

khmer development

start study/research metagenomics

digital normalization

diversity analysis on k-

mer level

Osedax Symbiontsdiversity

analysis on read level(IGS)

GPGC soil

sample

2008

2009

2010

2011

2012

2013

2014

developing a bottom-up computational approach to investigate microbial diversity

37

Future work

• Finish the IGS based diversity analysis paper • Refine pipeline/adjust statistical method to fit IGSs • More real data sets

• MetaHIT(Metagenomics of the Human Intestinal Tract) (working..) • HMP (Human Microbiome Project) (working..) • GPGC(Soil) (working..) • Ballast water virome (working..)

• Finish a review of the methods and applications of k-mer counting in bioinformatics (will also be part of my dissertation)

• Expand the application of IGS • sequencing depth/effort estimation, genome size estimation • reads binning/classification based on coverage profile across samples • relate IGS to phylogenetic info and function • extract IGS(reads) according different coverage profile (shared by all

Acknowledgement

● Dr. Titus Brown

● Lab members of GED

● Elijah Lowe

● Jiarong Guo

● Camille Scott

● Michael Crusoe

● Luiz Irber

● Dr. Sherine Awad

● Former members of GED

● Dr. Adina Howe

● Eric McDonald

● Dr. Jason Pell

● Dr. Likit Preeyanon

● RDP

● Dr. Jim Cole

● Jordan Fish

top related