church gmod2012 pt2

Post on 11-May-2015

434 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Part 2 of my talk at GMOD 2012

TRANSCRIPT

@deannachurch

Navigating Genome Resources at NCBI

Deanna M. Church, NCBI

The Evolution of the Reference Human Genome

Part 2

GenBank

Data Archives

Data in a common format Data in a single location (and mirrored) Most quality checked prior to deposition Robust data tracking mechanism (accession.version) Data owned by submitter

Data tracking

ABC14-1065514J1GapsPhase LengthDate

FP565796.1 1 121-Oct-2009

FP565796.2 1 014-Oct-2010

FP565796.3 3 007-Nov-2010

Mouse chrX: 34,800,000-34,890,000

NC_000086.123456 CM001013.17 2

Mouse chrX: 35,000,000-36,000000

X

MGSCv3 MGSCv36

hg19GRCh37

mm8MGSCv37

NCBIM37

danRer5Zv7

What’s in a name?

By any other name…

chr21:8,913,216-9,246,964

Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX

By any other name…

Genome Browser Agreement

Submitter deposits assembly to

GenBank/EMBL/DDBJAssembly QA

Submitter updates assembly based on QA

results

Browsers pick up assembly from

GenBank/EMBL/DDBJ

Assemblies must be in GenBank/EMBL/DDBJ

http://www.ncbi.nlm.nih.gov/genome/assembly

GRCh37hg19

Assembly (e.g. GRCh37.p5)GCA_000001405.6 /GCF_000001405.17

Primary Assembly

GCA_000001305.1/GCF_000001305.13

ALT 1

GCA_000001315.1/GCF_000001315.1

ALT 2

GCA_000001325.1/GCF_000001325.2

ALT 3

GCA_000001335.1/GCF_000001335.1

ALT 4

GCA_000001345.1/GCF_000001345.1

ALT 5

GCA_000001355.1/GCF_000001355.1

ALT 6

GCA_000001365.1/GCF_000001365.2

ALT 7

GCA_000001375.1/GCF_000001375.1

ALT 8

GCA_000001385.1/GCF_000001385.1

ALT 9

GCA_000001395.1/GCF_000001395.1

PatchesGCA_000005045.5GCF_000005045.4

Non-nuclear assembly unit

(e.g. MT)

GCA_000006015.1/GCF_000006015.1

GenBank RefSeq vs

Submitter Owned RefSeq Owned

Redundancy Non-RedundantUpdated rarely Curated

INSDC Not INSDC

BRCA183 genomic records31 mRNA records27 protein records

3 genomic records 5 mRNA records1 RNA record5 protein records

RefSeq for Assemblies

Typical assembly edits

Addition of non-nuclear (e.g. MT) assembly units

Removal of contamination

Drop unlocalized/unplaced scaffoldsMask contamination that is placed on chromosome

http://www.ncbi.nlm.nih.gov/genome

Understanding relationships between assemblies using alignments

First Pass

Second Pass

Reciprocal best hit

Non-reciprocal, duplicative hits

No second pass alignments in GRCh37.p5

NCBI36

GRCh37.p5

http://www.ncbi.nlm.nih.gov/tools/gbench/

Assemblies Transcripts Proteins

Set of genesOther decoration

Annotation pipeline

Francoise Thibaud-Nissen

Content of the final annotation productDescription In

sequence database

In a BLAST database

On FTP site

Chromosomes (NC_or AC_)

Scaffolds (NW_ or NT_) Curated transcripts/proteins (NM_, NR_/NP_)

Predicted transcripts/proteins (fully or partially -supported) (XM_, XR_/XP_)

Non-transcribed pseudogenes tRNA (annotated with tRNAScan) Ab initio Gnomon models

Annotation Pipeline RefSeq

Where to find the annotation products?• Nucleotide/Protein databases

• Gene• Map Viewer• BLAST databases• FTP site

http://www.ncbi.nlm.nih.gov/gene

http://www.ncbi.nlm.nih.gov/mapview

Annotating multiple assemblies

Group 1

Transcript

• Consistent placement of transcripts• Consistent labelling of the genes• Consistent annotation on all assemblies

Assembly 1

Assembly 2

• Assembly-assembly alignmentsAvailable at http://www.ncbi.nlm.nih.gov/genome/tools/remap

Group 2

Annotating multiple assemblies(2)

Btau_4.6.1

UMD_3.1

Same Gene symbol

Interacting with the community

FlyBase GenBank RefSeq

Thanks!

For Slides: Francoise Thibaud-Nissen Evan Eichler Steve Sherry

The Genome Reference ConsortiumThe Genome Center at Washington University The Wellcome Trust Sanger InstituteThe European Bioinformatics InstituteThe National Center for Biotechnology Information

Church group at NCBIValerie SchneiderNathan BoukHsiu-Chuan ChenPeter MericVictor AnanievChao ChenJohn LopezJohn GarnerTim HefferonCliff Clausen

NCBI

top related