visualization of genomic data
DESCRIPTION
Genome browsers. Visualization of genomic data. Survey. UCSC browser Ensembl browser Others ?. UCSC genome browser Basic functionalities used in exercise. Finding a gene by name by sequence Gene structure Orthologues – i.e. functional homolog in other organisms - PowerPoint PPT PresentationTRANSCRIPT
Visualization of genomic data
Genome browsers
•UCSC browser
• Ensembl browser• Others ?
Survey
UCSC genome browserBasic functionalities used in exercise
• Finding a gene• by name
• by sequence
• Gene structure
• Orthologues – i.e. functional homolog in other organisms
• SNP’s - Single Nucleotide Polymorphisms
• Several other functionalities
• Gene Sorter - sort according to expression, homology, in situ images of genes in different tissues
• Custom tracks – upload your own data
Visualization of genomic data
Genome browsers
Genome browsersVisualization of a gene
>chr5:123.004.678-125.345.112ATGAAGTTATGGGATGTCGTGGCTGTCTGCCTGGTGCTGCTCCACACCGCGTCCGCCTTCCCGCTGCCCGCCGGTAAGAGGCCTCCCGAGGCGCCCGCCGAAGACCGCTCCCTCGGCCGCCGCCGCGCGCCCTTCGCGCTGAGCAGTGACTGTAAGAACCGTTCCCTCCCCGCGGGGGGGCCGCCGGCGGACCCCCTCGCACCCCCACCCGCAGCCAGCCCCGCACGTACCCCAAGCCAGCCTGATGGCTGTGTGGCCTACCGACCCGTGGGCAAGGGGTGCGGGTGCTGAAGCCCCCAGGGGTGCCTGGCTGCCCACTGCTGCCCGCACGCCTGGCCTGAAAGTGACACGCGCTGGTTTGCCCAGCACAGAGGGGATGGAATTTTTATGCTGCTCCTTTAGCATTCTGATGAACAAATATCCTCCCCACCAGCACCACCACCTCAGTAA
Chr5 123.004.678 123.404.678 124.987.012 125.345.112
Open Reading Frame (ORF) – from start to stop codon
Flat files / tab files
Exon
Exon
Intron
Genome browsersWhy graphic Display ?
Why is a graphic display better than Flat files / tab files• A graphic display is compact• Meta data available i.e. Support information about a gene• Experimental evidence like EST• Predicted gene structures• SNP information• Links to many databases
In short much data about a gene is gathered is one placeand can be viewed easily.
Genome browsersVisualization of a gene (Ensembl)
Genome browsersVisualization of a gene (UCSC)
Exon Intron UTR
• UCSC genome browser• http://genome.ucsc.edu/
• Easy to use
• Often updates, but not as often as Ensembl
• upload of personal tracks
• Ensembl browser
• http://www.ensembl.org/index.html
• Less easy to use
• Maintained/updated by several people
• Gbrowser
• http://www.gmod.org/GBrowse
Genome browsers
BLATBlast Like Alignment Tool
• BLAT (2002)• Very fast searches (MySQL database)• Handle introns in RNA/DNA alignments• Data for more that 30 genomes (human, mouse, rat…)
Exon Intron Exon
Splice sites
BLAT genome Browserhttp://genome.ucsc.edu//
BLAT genome Browser
Using a search term or position eg Chr1:10,234-11,567
BLAT genome Browserhttp://genome.ucsc.edu/
BLAT genome Browser
Using a protein or DNA sequence
Blat genome Browser
BLAT genome Browser”Details”
Correct splice site ?
Logo PlotInformation Content
IC = -H(p) + log2(4) = a palog2pa + 2
The Information content is calculated from a multiple sequence alignment.
Result is a graphical visualization of sequence conservation where:• Total height at a position is the Information Content• Height of single letter is proportional to the frequency of that
letter
Mutiple alignment of 3 protein sequences:Seq1: A L R K P Q R TSeq2: A V R H I L L ISeq3: A I K V H N N T
Pos1: I = -[1*log2(1)]+ 4.32 = log2(20) = 4.32Pos2: I = -[1/3*log2(1/3)+ 1/3*log2(1/3)+ 1/3*log2(1/3)] + 4.32 = 2.73Pos3: I = -[2/3*log2(2/3)+ 1/3*log2(1/3) + 4.32 = 3.38
Logo Plot
Exon
BLAT genome Browser”Details”
Correct splice site ?
BLAT genome Browser”Details”
Donor site | Acceptor siteexon... . G | GT ...intron ...AG | exon...
Blat genome Browser
BLAT genome Browser”Browser”
Base,Center &ZoomKnown genes
Predictions
RNA
EST
Conservation
Expression
Genome browsers
Genome browsers
BLAT genome BrowserCenter & zoom
BLAT genome BrowserCenter & zoom
Forward/reverse directionSelected number of tracks
BLAT genome BrowserSequence Orthologs
BLAT genome BrowserSequence Orthologs
“klick”
BLAT genome BrowserSequence Orthologs
BLAT genome BrowserSequence Orthologs
BLAT genome BrowserSequence Orthologs
SNPs
Single Nucleotide PolymorphismSNP
• SNPs can be located anywere in the genome• non synomous (nsSNP) i.e. amino acid is changed (shown below )• Synomous SNP does not affect the the protein
An amino acid is coded by 3 nucleotidesValine (V): GTC
V I T
P
Humans are diploid: cells have 2 homologous copies of each chromosome i.e. 2*23 chromosomes. Haploid cells only 23 chromosomes (sex-cells)
Diploid organism - most mammals
A chromosome from mother
If the red strand is the plus-strand: C;T (or T;C but we write it alphabetical)If the green strand is the minus strand: G;A but we write it as G;A
A chromosome from father
An example of two homologous copies of ex chromosome 9 within a cell
SNPs
SNPs
Exercise
1. Basic understanding of the graphics2. Effect of Single Nucleotide Polymorphisms (SNPs)3. Finding Orthologue genes4. Identify chromosomal locus for a gene