church_ncbivariation2013

Post on 24-Jun-2015

549 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

NCBI Variation resources for CSHL Genome Access Course.

TRANSCRIPT

Deanna M. Church Staff Scientist, NCBI

@deannachurch

Variation Resources at NCBI 

Variation Resources Team at NCBIMing WardLon PhanBrad HolmesAnna GlodekMichael KholodovRama MaitiJuliana SampsonDavid ShaoEugene ShekhtmanQiang WangHua Zhang

Donna MaglottMelissa LandrumJennifer LeeGeorge RileyRay TullyCraig WallinShanmuga ChitipirallaDouglas HoffmanWonhee JangKen KatzMichael OvetskyRicardo Villamarin

Tim HefferonJohn LopezJohn GarnerChao Chen

Heidi Rehm, Harvard PartnersChrista Lese Martin, Geisinger Sherri Bale, GeneDxLisa Kalman, CDCBirgit Funke, Harvard PartnersMadhuri Hegde, Emory

Key Collaborators

Figure credit: http://itknowledgeexchange.techtarget.com/

dbSNPdbVar

ClinVarGTR

Quality ControlRef variantsReferences

Annotations

VisualizationTools

Data fromexternal sources

Variant Definitions Variant Annotations

LocationEvidenceMethodology

PhenotypesConsequencesTestsOther Biology

dbSNPdbVar

ClinVarGTRdbSNP

GenBank RefSeq vs

Submitter Owned RefSeq Owned

Redundancy Non-RedundantUpdated rarely Curated

INSDC Not INSDC

BRCA183 genomic records31 mRNA records27 protein records

3 genomic records 5 mRNA records1 RNA record5 protein records

Genome Res. 1999. 9: 677-679http://www.ncbi.nlm.nih.gov/snp

>gnl|dbSNP|ss76078129|allelePos=17|len=33|alleles='A/G’ GTGGCAGAGA CTGAATRAAGGGTTGAC CCAGGG

SNPs defined by flanking position

>gnl|dbSNP|ss3354770|allelePos=499|len=661|alleles='T/C’ actattcaca atagcaaaga cttggaacca acccaaatgt ccaacaatga tagactggat taagaaaatg tggcacatat acaccatgga atactaggca TTCCATTCTA CTGTGCACGA GTCACTGCAA ACTCAAGCAT TTCCAGAGTT CTGAAAGCTC AACTAAGAAC CAAGCCTACT CATTCAACAT CAACACACAC AGCACCCTGA GCGTCCAAAA CCACGGGGGT TATGTTCTAG ACCACAGGAC TGGCTACCTG GCCCTGCTCA AGGCGGCAGG ATCAATGGGC AAGAATGTGC AAGAATTTAC CACAACTCAG CCTTGCTGTG TCAACCACAG AGGCCAAGTA CCCCTAACAC CCAGATAGAG TAATTGTGCC TTACTTCTTT GTTCATTCCC ACCATTACAT TTTGTAAATT GGAACTTCTA GGAGGTTAGA AGGATATGCT GATCAAAAAA AGGGGACATA TTCAAGGAGT GTCCCTGGGT CAACCCTT Y ATTCAGTCTC TGCCACATGT CTAGTAACTG TGAGTGATGG GTGCATCAGT ATAATCCTGA GCCTCCCAAG GTACAGCCTT TCACTACTAT TCATCATATT GGCTAAGGTA TTCATCATAT TGGCTAAGGT ATTCACCAAC AGGGCTCATT TTCTATCAGA CC

ss76078129 (aligns to plus strand)

'A/G’ ss76078129ss3354770 'T/C’

ss3354770 (aligns to minus strand)

ss76078129 (33bp)

ss76078129 (661bp)

rs397515413

rs397515413

NC_000016.9 (chr16)

NW_003871055.3 (chr1 fix patch)

Hydin

Hydin2

Defines variant by location rather than flanking sequence

VCF (Variant Call File)

Clustering microsatellites

rs62645748

To be replacedby a Variation Viewer

To be replacedby a link to ClinVar

rs62645748 (NCBI Homo sapiens annotation run 104)

http://www.ncbi.nlm.nih.gov/dbvar

Submitter Information

Study Information

Sample/Sampleset data

Experiment data

Variants

Contact and author information

Study meta-data (description, PMID, ProjectID, etc)

Sample IDs (if samples are consented)Sampleset ID for pooled samples (case v control sets)

Assay method (sequencing, array)Platform and analysis information

Variant definitions

Variant Call Ambiguitystart stop

Inner start Inner stop

Outer start Outer stop

Probes with decreased signal intensityProbes with expected signal intensity

breakpoint breakpoint

Inner start Inner stop

Variant Call AmbiguityOuter start Outer stop

Fosmid clone (40 Kb +/- 1 Kb)

20Kb Clone has an insertionrelative to the genome

Clone has a deletionrelative to the genome 60 Kb

http://www.ncbi.nlm.nih.gov/clinvar

ClinVar data model and display

SCV

RCV

SCV

RCV

VariantPhenotypeSubmitter

AlleleVariant

Variant Phenotype

SCV SCV SCV SCV

Allele summary• Gene• Variant type• Genomic location• HGVS expressions*• Molecular

consequence*• Links*• Frequency*

Phenotype summary• Names• Links*• Age of onset *• Prevalence *

Interpretation• Significance• Review status *• Accession.version *

* May be provided by NCBI

ClinVar RCV report - Overview

ClinVar RCV report – Summary of assertions

• Each submission is accessioned and versioned• Terms provided by the submitter are mapped to controlled values• Method of review is clearly reported so primary data can be distinguished

from that reported in the literature

ClinVar RCV report - Evidence

Under active review

Allele report – available December

http://www.ncbi.nlm.nih.gov/refseq/rsghttp://www.lrg-sequence.org/

http://www.ncbi.nlm.nih.gov/refseq/rsg

RefSeq Gene

L R

http://www.ncbi.nlm.nih.gov/genome/tools/remap

From Assembly 1 <-> Assembly 2Assembly <-> RefSeqGene/LRGPrimary Assembly <-> Alternate loci

1:215844373

http://www.ncbi.nlm.nih.gov/variations/tools/reporter

This new look coming next month

http://www.ncbi.nlm.nih.gov/variation/view

http://www.ncbi.nlm.nih.gov/variation/tools/get-rm

Calls

Tests

cSRA

ConcordantDiscordantNA

Target audience: Clinical testing labsSubmissions from: Clinical and Research labs

Twelve submitting labs to date

Twelve custom scripts to regularize data

Defined formats here:http://www.ncbi.nlm.nih.gov/projects/variation/get-rm

Platforms

HiSeq 2000 HiSeq 2500 MiSeq Ion Torrent Sanger 4540

5

10

15

20

25

30

NA12878 Tests by Platform

Lab Provided Validation

Variants validated in this sample using another platformVariants validated in another sample using another platformVariants seen in other samples from submitting lab using this platformVariants seen in public data setVariants that are novelVariants that were not assessed

Based on May 2013 Data release

Based on May 2013 Data release

http://www.ncbi.nlm.nih.gov/variation/tools/get-rm

Gene level concordance

Σ (max(xi)/Σ T)i = genotype callX = count per call for each variantT = total genotype calls per variant

Sums are taken over all variants ina gene.Tested regions taken into accountPhasing ignored

top related