snp resources: finding snps databases and data extraction
DESCRIPTION
SNP Resources: Finding SNPs Databases and Data Extraction. Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005. Genotype - Phenotype Studies. Typical Approach: “I have candidate gene/region and samples ready to study. - PowerPoint PPT PresentationTRANSCRIPT
SNP Resources: Finding SNPsSNP Resources: Finding SNPsDatabases and Data ExtractionDatabases and Data Extraction
Mark J. Rieder, PhDMark J. Rieder, PhDRobert J. Livingston, PhDRobert J. Livingston, PhD
NIEHS Variation WorkshopNIEHS Variation WorkshopJanuary 30-31, 2005January 30-31, 2005
Genotype - Phenotype Studies
Other questions:Other questions:How do I know I have *all* the SNPs?How do I know I have *all* the SNPs?What is the validation/quality of the SNPs that are known?What is the validation/quality of the SNPs that are known?Are these SNPs informative in my population/sample?Are these SNPs informative in my population/sample?
What do I need to know for selecting the “best” SNPs?What do I need to know for selecting the “best” SNPs?How do I pick the “best” SNPs?How do I pick the “best” SNPs?
Typical Approach:
“I have candidate gene/region and samples ready to study. Tell me what SNPs to genotype.”
What information do I need toWhat information do I need tocharacterize a SNP for genotyping?characterize a SNP for genotyping?
Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization
• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC
• dbSNP reference SNP # (rs #)
• Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure.
• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>10%), rare• Are other SNPs associated - redundant? Genotyping data!
Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?
1. Entrez Gene1. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP
2. HapMap Genome Browser2. HapMap Genome Browser
3. NIEHS Environmental Genome Project (EGP)3. NIEHS Environmental Genome Project (EGP)Candidate gene websiteCandidate gene website
4. NIEHS web applications and other tools4. NIEHS web applications and other toolsGeneSNPS, PolyDoms, TraFac, PolyPhen, GeneSNPS, PolyDoms, TraFac, PolyPhen, ECR Browser, GVS ECR Browser, GVS
NCBI - Database ResourceNCBI - Database Resource
www.ncbi.nlm.nih.gov
NOS2A
Finding SNPs: Where do I start?Finding SNPs: Where do I start?http://www.ncbi.nlm.nih.gov/gquery
Finding SNPs: Where do I start?Finding SNPs: Where do I start?
NCBI - Entrez GeneNCBI - Entrez Gene
Finding SNPs: Entrez GeneFinding SNPs: Entrez Gene
dbSNP GeneviewdbSNP Geneview
dbSNP GeneviewdbSNP Geneview
HapMap VerifiedHapMap Verified
Finding SNPs: dbSNP validationFinding SNPs: dbSNP validation
(by 2hit-2allele)(by 2hit-2allele)
Finding SNPs: dbSNP databaseFinding SNPs: dbSNP database
Entrez SNP - dbSNP genotype retrievalEntrez SNP - dbSNP genotype retrieval
Finding SNPs - Gene Genotype ReportFinding SNPs - Gene Genotype Report
Finding SNPs - Gene Genotype ReportFinding SNPs - Gene Genotype Report
Finding SNPs - Gene Genotype ReportFinding SNPs - Gene Genotype Report
Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization
• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC
• dbSNP reference SNP # (rs #)
• Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure.
• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>10%), rare• Are other SNPs associated - redundant? Genotyping data!
dbSNP - data is theredbSNP - data is there
Entrez Gene Entry - Entrez SNPEntrez Gene Entry - Entrez SNP
Entrez SNP - direct dbSNP queryingEntrez SNP - direct dbSNP querying
Entrez SNP - Parseable Multi-SNP reportsEntrez SNP - Parseable Multi-SNP reports
Entrez SNP - Parseable Multi-SNP reportsEntrez SNP - Parseable Multi-SNP reports
Entrez SNP - Search Limiting CapabilitiesEntrez SNP - Search Limiting Capabilities
NOS2A
Entrez SNP - Search LimitsEntrez SNP - Search Limits
Entrez SNP - Search Limiting CapabilitiesEntrez SNP - Search Limiting Capabilities
Entrez SNP - Query Term CapabilitiesEntrez SNP - Query Term Capabilities
Entrez SNP - Search Terms FieldsEntrez SNP - Search Terms Fields
Entrez SNP - Search Terms FieldsEntrez SNP - Search Terms Fields
2[CHR] AND "coding nonsynon"[FUNC]
More advanced queries:More advanced queries:
Entrez SNP - Search Terms FieldsEntrez SNP - Search Terms Fields
2[CHR] AND "coding nonsynon"[FUNC] AND ”EGP_SNPS"[HANDLE]
Note: Can also use wildcard (*) characters, AND, OR, and NOT operators
More advanced queries:More advanced queries:
Entrez SNP - Advanced QueriesEntrez SNP - Advanced Queries
Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization
• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC
• dbSNP reference SNP # (rs #)
• Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure.
• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>10%), rare• Are other SNPs associated - redundant? Genotyping data!
EntrezSNP - better!EntrezSNP - better!
Finding SNPs - Entrez SNP SummaryFinding SNPs - Entrez SNP Summary
1.1. dbSNP is useful for investigating detailed information on a dbSNP is useful for investigating detailed information on a small number SNPs - and its good for a picture of the genesmall number SNPs - and its good for a picture of the gene
2.2. Entrez SNP is a direct, fast, database for querying SNP data.Entrez SNP is a direct, fast, database for querying SNP data.
3.3. Data from Entrez SNP can be retrieved in batches for many SNPsData from Entrez SNP can be retrieved in batches for many SNPs
4.4. Entrez SNP data can be “limited” to specific subsets of SNPsEntrez SNP data can be “limited” to specific subsets of SNPsand formatted in plain text for easy parsing and manipulationand formatted in plain text for easy parsing and manipulation
5.5. More detailed queries can be formed using specific “field tags” More detailed queries can be formed using specific “field tags” for retrieving SNP data for retrieving SNP data
Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?
1. Entrez Gene1. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP
2.2. HapMap Genome BrowserHapMap Genome Browser
3. NIEHS Environmental Genome Project (EGP)3. NIEHS Environmental Genome Project (EGP)Candidate gene websiteCandidate gene website
4. NIEHS web applications and other tools4. NIEHS web applications and other toolsGeneSNPS, PolyDoms, TraFac, PolyPhen, GeneSNPS, PolyDoms, TraFac, PolyPhen, ECR Browser, GVS ECR Browser, GVS
www.hapmap.orgwww.hapmap.org
Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser
Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser
Finding SNPs: HapMap GenotypesFinding SNPs: HapMap Genotypes
Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser
Minimal SNP information for genotyping/characterizationMinimal SNP information for genotyping/characterization
• What is the SNP? Flanking sequence and alleles. FASTA format>snp_nameACCGAGTAGCCAG[A/G]ACTGGGATAGAAC
• dbSNP reference SNP # (rs #)
• Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure.
• How was it discovered? Method • What assurances do you have that it is real? Validated how?• What population – African, European, etc?• What is the allele frequency of each SNP? Common (>10%), rare• Are other SNPs associated - redundant? Genotyping data!
Finding SNPs: HapMap BrowserFinding SNPs: HapMap Browser
1.1. HapMap data sets are useful because HapMap data sets are useful because individual genotype data can be used to determine optimalindividual genotype data can be used to determine optimalgenotyping strategies (tagSNPs) or perform populationgenotyping strategies (tagSNPs) or perform populationgenetic analyses (linkage disequilbrium)genetic analyses (linkage disequilbrium)
2.2. Data are specific produced by those projects (not all Data are specific produced by those projects (not all dbSNP)dbSNP) HapMap data is available in dbSNPHapMap data is available in dbSNP
3.3. HapMap data (Phase II) can be accessed preleased prior to HapMap data (Phase II) can be accessed preleased prior to dbSNPsdbSNPs
4.4. Easier visualization of data and direct access to Easier visualization of data and direct access to SNP data, individual genotypes, and LD analysisSNP data, individual genotypes, and LD analysis
Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?
1. Entrez Gene1. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP
2. HapMap Genome Browser2. HapMap Genome Browser
3. NIEHS Environmental Genome Project (EGP)3. NIEHS Environmental Genome Project (EGP)Candidate gene websiteCandidate gene website
4. NIEHS web applications and other tools4. NIEHS web applications and other toolsGeneSNPS, PolyDoms, TraFac, PolyPhen, GeneSNPS, PolyDoms, TraFac, PolyPhen, ECR Browser, GVS ECR Browser, GVS
Finding SNPs: NIEHS SNPs Candidate GenesFinding SNPs: NIEHS SNPs Candidate Genes
egp.gs.washington.eduegp.gs.washington.edu
Finding SNPs: NIEHS SNPs Candidate Genes Finding SNPs: NIEHS SNPs Candidate Genes
Finding SNPs: NIEHS SNPs Candidate Genes Finding SNPs: NIEHS SNPs Candidate Genes
Finding SNPs: NIEHS SNPs Candidate Genes Finding SNPs: NIEHS SNPs Candidate Genes
African AmericanAfrican American
African YRIAfrican YRI
European CEUEuropean CEU
HispanicHispanic
Asian CHB JPTAsian CHB JPT
SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2Repeat for all individualsRepeat for all individualsRepeat for next SNPRepeat for next SNP
PolyPhen - PolyPhen - PolyPolymorphism morphism PhenPhenotypingotypingStructural protein characteristics and evolutionary comparisonStructural protein characteristics and evolutionary comparison
SIFT = Sorting Intolerant From TolerantSIFT = Sorting Intolerant From TolerantEvolutionary comparison of non-synonymous SNPsEvolutionary comparison of non-synonymous SNPs
Finding SNPs: NIEHS SNPs Candidate Genes Finding SNPs: NIEHS SNPs Candidate Genes
Finding SNPs: NIEHS SNPs Candidate GenesFinding SNPs: NIEHS SNPs Candidate Genes
egp.gs.washington.eduegp.gs.washington.edu
Finding SNPs: NIEHS SNPs Candidate GenesFinding SNPs: NIEHS SNPs Candidate Genes
Finding SNPs: Databases and ExtractionFinding SNPs: Databases and Extraction
How do I find and download SNP data for analysis/genotyping?How do I find and download SNP data for analysis/genotyping?
1. Entrez Gene1. Entrez Gene- dbSNP- dbSNP- Entrez SNP- Entrez SNP
2. HapMap Genome Browser2. HapMap Genome Browser
3. NIEHS Environmental Genome Project (EGP)3. NIEHS Environmental Genome Project (EGP)Candidate gene websiteCandidate gene website
4. NIEHS web applications and other tools4. NIEHS web applications and other toolsGeneSNPS, PolyDoms, TraFac, PolyPhen, GeneSNPS, PolyDoms, TraFac, PolyPhen, ECR Browser, GVSECR Browser, GVS
GeneSNPsGeneSNPs
Graphic view of SNPs in context of gene elementsGraphic view of SNPs in context of gene elementsAll NIEHS genes presentedAll NIEHS genes presented
- organized by pathway/function- organized by pathway/functionSNPs from dbSNP SNPs from dbSNP
- organized by submitter handle- organized by submitter handleSequence context of SNPs presented in Color Fasta Sequence context of SNPs presented in Color Fasta
formatformatLink-outs to EntrezSNP pagesLink-outs to EntrezSNP pagesSummary “Genome SNPs” view for one-stop SNP Summary “Genome SNPs” view for one-stop SNP
shopping shopping
http://www.genome.utah.edu/genesnps/http://www.genome.utah.edu/genesnps/
GeneSNPs: One stop shoppingGeneSNPs: One stop shopping
GeneSNPs: One stop shoppingGeneSNPs: One stop shopping
GeneSNPs: One stop shoppingGeneSNPs: One stop shopping
Polydoms Polydoms
A web-based application that maps synonymous and A web-based application that maps synonymous and non-synonymous SNPs onto known functional protein non-synonymous SNPs onto known functional protein domainsdomains
• SNPs are from dbSNP and GeneSNPsSNPs are from dbSNP and GeneSNPs• Domain structures from NCBI's Conserved Domain Domain structures from NCBI's Conserved Domain
Database Database • Functional predictions based on SIFT and Functional predictions based on SIFT and
PolyPhenPolyPhen• 3 dimensional mapping of SNPs on protein 3 dimensional mapping of SNPs on protein
structure using Chime viewerstructure using Chime viewer
http://polydoms.cchmc.org/polydoms/http://polydoms.cchmc.org/polydoms/
Polydoms
PolydomsPolydoms
Mapping of nsSNPS onto protein structure
ARG <-> 5 HISARG <-> 5 HIS
ARG <-> 107 HISARG <-> 107 HIS
TraFac: Transcription Factor TraFac: Transcription Factor
Binding Site ComparisonBinding Site Comparison
A tool for validating cis regulatory elements conserved A tool for validating cis regulatory elements conserved between human and mousebetween human and mouse
• Aligns human and mouse sequences using BLASTZAligns human and mouse sequences using BLASTZ• Consensus transcription factor binding sequences from Consensus transcription factor binding sequences from
Transfac database Transfac database
http://trafac.cchmc.org/trafachttp://trafac.cchmc.org/trafac
All TFBS in commonAll TFBS in common TFBS in parallelTFBS in parallel
TraFac: Transcription Factor TraFac: Transcription Factor
Binding Site ComparisonBinding Site Comparison
Aligns sequences to Mouse, Rat, Dog, Opposum, Aligns sequences to Mouse, Rat, Dog, Opposum, Chicken, Fugu and DrosophilaChicken, Fugu and Drosophila
Gene annotations from UCSC Genome BrowserGene annotations from UCSC Genome Browser
Easy retrieval of ECR sequences and alignmentsEasy retrieval of ECR sequences and alignments
Pre-computed transcription factor binding sites Pre-computed transcription factor binding sites
http://ecrbrowser.dcode.orghttp://ecrbrowser.dcode.org
ECR Browser: Evolutionary Conserved Regions
ECR Browser: Evolutionary Conserved Regions
ECR Browser: Evolutionary Conserved Regions
Human-mouse alignmentHuman-mouse alignment Fasta sequencesFasta sequences
ECR Browser: Evolutionary Conserved Regions
Transcription Factor Binding Sites from TransfacTranscription Factor Binding Sites from Transfac
Physical and comparative analyses used to make Physical and comparative analyses used to make predictionspredictions
Uses SwissProt annotations to identify known Uses SwissProt annotations to identify known domainsdomains
Calculates a substitution probability from BLAST Calculates a substitution probability from BLAST alignments of homologous and orthologous alignments of homologous and orthologous sequencessequences
Ranks substitutions on scale of predicted functional Ranks substitutions on scale of predicted functional effects from “benign” to “probably damaging”effects from “benign” to “probably damaging”
PolyPhen: Polymorphism Phenotyping-PolyPhen: Polymorphism Phenotyping- prediction of functional effect of human nsSNPsprediction of functional effect of human nsSNPs
http://tux.embl-heidelberg.de/ramensky/http://tux.embl-heidelberg.de/ramensky/
PolyPhen: Polymorphism Phenotyping-PolyPhen: Polymorphism Phenotyping- prediction of functional effect of human nsSNPsprediction of functional effect of human nsSNPs
tux.embl-heidelberg.de/ramensky/tux.embl-heidelberg.de/ramensky/
Provides rapid analysis of 4.3 million genotyped SNPs Provides rapid analysis of 4.3 million genotyped SNPs from dbSNP and the HapMap from dbSNP and the HapMap
Mapped to human genome build 35 (hg17) Mapped to human genome build 35 (hg17) Displays genotype data in text and image formatsDisplays genotype data in text and image formatsDisplays tagSNPs or clusters of informative SNPs in Displays tagSNPs or clusters of informative SNPs in
text and image formatstext and image formatsDisplays linkage disequilibrium (LD) in text and image Displays linkage disequilibrium (LD) in text and image
formatsformats
GVS: Genome Variation Server
http://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/
GVS: Genome Variation Server
http://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/
NOS2ANOS2A
EGP Yoruban populationEGP Yoruban population
UTR and Coding SNPsUTR and Coding SNPs
Finding SNPs: Databases and Extraction Finding SNPs: Databases and Extraction
One stop shopping One stop shopping - NIEHS SNPs and GeneSNPs - NIEHS SNPs and GeneSNPs
Prediction of functional variationsPrediction of functional variations- Polydoms and PolyPhen - Polydoms and PolyPhen
Identification of trancription factor binding sites in Identification of trancription factor binding sites in Evolutionary Conserved RegionsEvolutionary Conserved Regions
- TraFac and the ECR browser- TraFac and the ECR browser
Visualization and analysis of LD and TagSNPs Visualization and analysis of LD and TagSNPs - GVS - GVS