global catalogue of microorganisms(gcm) 2.0: sequencing ... gcm2.0 11.22.pdf · wdcm cover the...

32
Global Catalogue of Microorganisms(GCM) 2.0: Sequencing for Type Strains Juncai MA, Linhuan WU World Data Center for Microorganisms(WDCM) The Microbial Resource and Big Data Center, IMCAS

Upload: others

Post on 27-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Global Catalogue of Microorganisms(GCM) 2.0:

Sequencing for Type Strains

Juncai MA, Linhuan WU

World Data Center for Microorganisms(WDCM)

The Microbial Resource and Big Data Center, IMCAS

Page 2: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Cooperation Background

Scientific Targets

Roadmap

CONTENTS

Progress

Page 3: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Identification of strains and quality control

Improving identification of protein families and ortholog groups across species, and

hence annotation of other microbial genomes;

Providing phylogenetic anchoring of metagenomic data;

Improving Gene discovery by selecting phylogenetically novel organisms

Understanding of the processes underlying the evolution of microbes and

correlations of phenotype and genotype in microbes.

Sequencing type strains

Page 4: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

1,003 reference genomes of bacterial and archaeal isolates

expand coverage of the tree of life.

974 bacterial and 29 archaeal genomes (from 579 genera in 21 phyla and

43 classes) were sequenced as part of the GEBA Initiative (GEBA-I),

using a phylogeny-based scoring system for strain selection

Blue denotes the genetic diversity covered by 828 genomes of

type strains before GEBA-I, red denotes the diversity covered by

the GEBA-I genomes and gray denotes the remaining type

strains lacking a genome sequence.

BACTERIA& ARCHAEA

4,000 / 12,239

type strains sequenced

(Whitman et al., 2015)

7,048/14,895

Type strains sequenced

(WDCM statistics in 2018)

provide a phylogenetically balanced genomic representation

Page 5: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

DISTRIBUTION OF GLOBAL TYPE STRAINS

52 151 type strains distributed in 134

Culture Collections

93.8% from

top 30

Page 6: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

6

Asia culture collections received depositions of

a total 940 type strains (56.2%).

Validated species and type strains

CC No. of type strain deposited

DSMZ 310

JCM 239

KCTC 214

LMG 131

NBRC 112

CGMCC 109

CECT 102

KACC 92

CCTCC 70

CCUG 56

…. ….

A total of 1678 type strains of

819 novel (sub)species validated in 2014

CC No. of type strain deposited

KCTC 328

DSMZ 255

JCM 222

LMG 165

NBRC 112

KACC 104

CGMCC 96

MCCC 90

CCTCC 61

CECT 49

BCRC 33

…. ….

Asia culture collections received depositions of

a total 940 type strains (61.7%).

A total of 1874 type strains of

866 novel (sub)species validated in 2017

Page 7: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Japan

Korea

China Other Asia

Others

A great activity of describing novel microbial species in Asia

29.7%

21.0%

11.7%

Authors’ country in 2014

75% of IJSEM papers are published

by Asian researchers

Asia

Percentage ratio of type strain Country of

Origin of the IJSEM papers in 2017, 60%

were isolated from China, Korea and Japan

China Korea Japan Spain Germany Antarctica Others

China

Korea

Others

Country of Origin in 2017

Page 8: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Global Catalogue of Microorganisms

GCM I S C O M I N G

Page 9: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

WDCM cover the costs for sequencing services, database system and data analysis

Raw data and analysis results are published online for free access.

Call for strains and samples from culture collections and scientists with a targets for a specific subject researches.

• 10,000 bacteria, archaea and fungi type strains

GCM 2.0: Sequencing for Type Strains

Theme: Sequencing for existing type strains

Outputs in 5 years:

Page 10: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Pilot Y2 Y3 Y4 Y5

Organization

SOP

Database

Sequencing

Subproject

Training

Meta data

300/100

20 participants each year

2000/300 2000/500 2000/600 2000/500

Functional database

3-5 3-5 3-5 3-5

Time table

Page 11: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Progress

Organization

Participants

Progress

Page 12: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

• ATCC, USA

• BCCM/LMG, Belgium

• BCRC, Chinese Taipei

• CAIM, Mexico

• CBS, Netherland

• CCM, Czech

• CCUG, Sweden

• CECT, Spain

• CICC, China

• CIP, France

• CGMCC, China

• FGSC, USA

• ICMP, Netherland

CULTURE COLLECTIONS CONFIRMED

• JCM, Japan

• KCTC, Korea

• KMM, Russia

• MUM, Portugal

• NCTC, UK

• NBRC, Japan

• NCAIM, Portugal

• PCU, Thailand

• TBRC, Thailand

• TISTR, Thailand

• UCD-FST, USA

• VKM, Russia

The 25 collections from 16 countries and regions

Page 13: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

GCM 2.0 PROJECT PROGRESS

Scientific committee: Open for recommendations

Working Groups:

1. Bacteria Selection

2. Fungi Selection

3. SOPs

4. Database

5. Intellectual Property Right and Legal Issue.

Page 14: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

STRAIN SELECTION STRATEGIES

Phylogenetic Diversity Priority

Scientific targets relevance

Availability of the resources

Nagoya Protocol Safety

Quality Consideration-Two type strains in different collections

Page 15: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Sample preparation

Sequencing

QC & assembly

Annotation

Database

Strain selections

Subproject selection

Data analysis

Joint Publication Open to the Public

Scientific Committee &

Working group Culture Collections

Cultures or DNA

Page 16: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

List of type species and type strains

Page 17: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

List of type species and type strains

Page 18: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

List of type species and type strains

Page 19: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

MOU & MTA template

Nagoya protocol safety

WDCM shall use the cultures, DNA samples and associated data only for the

following purposes: sequencing, data exploring and integrating data into data

platform for microbial resources.

WDCM shall ensure the cultures, and DNA samples to be destroyed after the

sequencing project

Page 20: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

STRAIN LIST FOR PILOT STAGE

Collection Bacterial Fungi

KCTC 312

ATCC 20

CGMCC 290

NBRC 92

JCM 51

NCTC 25

TBRC 56

TISTR 28

BCCM 10 75

CICC 5

VKM 59

CAIM 28

NCAIM 43

CBS 90

Total 927 224

Page 21: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

SOP for Sample preparation and submission

Page 22: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

STRAIN LIST FOR PILOT STAGE

Culture collection

CAIM 37 Inoculum

CGMCC 17 live-culture

ICMP 39 freeze dried

JCM 15 DNA

KCTC 54 Cell mass

NBRC 92 DNA

NCAIM 43 freeze dried

CICC 5 DNA

TBRC 15 DNA

Total 317

Page 23: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

DNA

samples

Culture Collections

live-culture freeze dried

Sequencing (7-10days)

Data annotation

In BGI (3days)

Data annotation

In IMCAS (7 days)

Data results for

culture collections

40-50 strains/month 3 days sample report

30 working days for DNA

samples

2-3 months for cultures

Quality Control Steps

Further

analysis

317 strains

Contamination 3

Incorrect 15

Cultures

DNA samples

Raw data

Assembly genomes

Page 24: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Sequencing Capacity

Sequencers(295+)

BGISeq-500 Illumina/HiSeq Illumina/MiSeq AB/3730xi Roche/454

PacBio RSⅡ Sequel Bionano Irys System Life Tech/Ion Torrent

Sequencing Capacity:>30 Tb / day

BGI has the largest sequencing capacity in the world.

Page 25: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

IN-HOUSE DATA MANAGEMENT SYSTEM

Page 26: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Standard data analysis pipeline

Page 27: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis
Page 28: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis
Page 29: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Cooperation

Cooperation mechanism

Coordinate with other projects

Page 30: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Sequencing

Standards & SOP

Complete the prokaryotic tree of life

Reference

database

Genes, Proteins,

Pathways

Improvement of

annotation

Data reports

Sampling

Functions and

evolution

Recruitment of

metagenomic reads

Identification of New

species

EXPECTED OUTPUTS

Page 31: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Cooperation Mechanisms

Sequencing

Data sharing

Network

Culture

collections Scientists

WDCM

Page 32: Global Catalogue of Microorganisms(GCM) 2.0: Sequencing ... GCM2.0 11.22.pdf · WDCM cover the costs for sequencing services, database system and data analysis Raw data and analysis

Thanks for your attention !

Do our best for cooperation !