visualization of genomic data

37
Visualization of genomic data Genome browsers

Upload: sutton

Post on 21-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Genome browsers. Visualization of genomic data. Survey. UCSC browser Ensembl browser Others ?. UCSC genome browser Basic functionalities used in exercise. Finding a gene by name by sequence Gene structure Orthologues – i.e. functional homolog in other organisms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Visualization of genomic data

Visualization of genomic data

Genome browsers

Page 2: Visualization of genomic data

•UCSC browser

• Ensembl browser• Others ?

Survey

Page 3: Visualization of genomic data

UCSC genome browserBasic functionalities used in exercise

• Finding a gene• by name

• by sequence

• Gene structure

• Orthologues – i.e. functional homolog in other organisms

• SNP’s - Single Nucleotide Polymorphisms

• Several other functionalities

• Gene Sorter - sort according to expression, homology, in situ images of genes in different tissues

• Custom tracks – upload your own data

Page 4: Visualization of genomic data

Visualization of genomic data

Genome browsers

Page 5: Visualization of genomic data

Genome browsersVisualization of a gene

>chr5:123.004.678-125.345.112ATGAAGTTATGGGATGTCGTGGCTGTCTGCCTGGTGCTGCTCCACACCGCGTCCGCCTTCCCGCTGCCCGCCGGTAAGAGGCCTCCCGAGGCGCCCGCCGAAGACCGCTCCCTCGGCCGCCGCCGCGCGCCCTTCGCGCTGAGCAGTGACTGTAAGAACCGTTCCCTCCCCGCGGGGGGGCCGCCGGCGGACCCCCTCGCACCCCCACCCGCAGCCAGCCCCGCACGTACCCCAAGCCAGCCTGATGGCTGTGTGGCCTACCGACCCGTGGGCAAGGGGTGCGGGTGCTGAAGCCCCCAGGGGTGCCTGGCTGCCCACTGCTGCCCGCACGCCTGGCCTGAAAGTGACACGCGCTGGTTTGCCCAGCACAGAGGGGATGGAATTTTTATGCTGCTCCTTTAGCATTCTGATGAACAAATATCCTCCCCACCAGCACCACCACCTCAGTAA

Chr5 123.004.678 123.404.678 124.987.012 125.345.112

Open Reading Frame (ORF) – from start to stop codon

Flat files / tab files

Exon

Exon

Intron

Page 6: Visualization of genomic data

Genome browsersWhy graphic Display ?

Why is a graphic display better than Flat files / tab files• A graphic display is compact• Meta data available i.e. Support information about a gene• Experimental evidence like EST• Predicted gene structures• SNP information• Links to many databases

In short much data about a gene is gathered is one placeand can be viewed easily.

Page 7: Visualization of genomic data

Genome browsersVisualization of a gene (Ensembl)

Page 8: Visualization of genomic data

Genome browsersVisualization of a gene (UCSC)

Exon Intron UTR

Page 9: Visualization of genomic data

• UCSC genome browser• http://genome.ucsc.edu/

• Easy to use

• Often updates, but not as often as Ensembl

• upload of personal tracks

• Ensembl browser

• http://www.ensembl.org/index.html

• Less easy to use

• Maintained/updated by several people

• Gbrowser

• http://www.gmod.org/GBrowse

Genome browsers

Page 10: Visualization of genomic data

BLATBlast Like Alignment Tool

• BLAT (2002)• Very fast searches (MySQL database)• Handle introns in RNA/DNA alignments• Data for more that 30 genomes (human, mouse, rat…)

Exon Intron Exon

Splice sites

Page 11: Visualization of genomic data

BLAT genome Browserhttp://genome.ucsc.edu//

Page 12: Visualization of genomic data

BLAT genome Browser

Using a search term or position eg Chr1:10,234-11,567

Page 13: Visualization of genomic data

BLAT genome Browserhttp://genome.ucsc.edu/

Page 14: Visualization of genomic data

BLAT genome Browser

Using a protein or DNA sequence

Page 15: Visualization of genomic data

Blat genome Browser

Page 16: Visualization of genomic data

BLAT genome Browser”Details”

Correct splice site ?

Page 17: Visualization of genomic data

Logo PlotInformation Content

IC = -H(p) + log2(4) = a palog2pa + 2

The Information content is calculated from a multiple sequence alignment.

Result is a graphical visualization of sequence conservation where:• Total height at a position is the Information Content• Height of single letter is proportional to the frequency of that

letter

Mutiple alignment of 3 protein sequences:Seq1: A L R K P Q R TSeq2: A V R H I L L ISeq3: A I K V H N N T

Pos1: I = -[1*log2(1)]+ 4.32 = log2(20) = 4.32Pos2: I = -[1/3*log2(1/3)+ 1/3*log2(1/3)+ 1/3*log2(1/3)] + 4.32 = 2.73Pos3: I = -[2/3*log2(2/3)+ 1/3*log2(1/3) + 4.32 = 3.38

Page 18: Visualization of genomic data

Logo Plot

Exon

Page 19: Visualization of genomic data

BLAT genome Browser”Details”

Correct splice site ?

Page 20: Visualization of genomic data

BLAT genome Browser”Details”

Donor site | Acceptor siteexon... . G | GT ...intron ...AG | exon...

Page 21: Visualization of genomic data

Blat genome Browser

Page 22: Visualization of genomic data

BLAT genome Browser”Browser”

Base,Center &ZoomKnown genes

Predictions

RNA

EST

Conservation

Expression

Page 23: Visualization of genomic data

Genome browsers

Page 24: Visualization of genomic data

Genome browsers

Page 25: Visualization of genomic data

BLAT genome BrowserCenter & zoom

Page 26: Visualization of genomic data

BLAT genome BrowserCenter & zoom

Forward/reverse directionSelected number of tracks

Page 27: Visualization of genomic data

BLAT genome BrowserSequence Orthologs

Page 28: Visualization of genomic data

BLAT genome BrowserSequence Orthologs

“klick”

Page 29: Visualization of genomic data

BLAT genome BrowserSequence Orthologs

Page 30: Visualization of genomic data

BLAT genome BrowserSequence Orthologs

Page 31: Visualization of genomic data

BLAT genome BrowserSequence Orthologs

Page 32: Visualization of genomic data

SNPs

Page 33: Visualization of genomic data

Single Nucleotide PolymorphismSNP

• SNPs can be located anywere in the genome• non synomous (nsSNP) i.e. amino acid is changed (shown below )• Synomous SNP does not affect the the protein

An amino acid is coded by 3 nucleotidesValine (V): GTC

V I T

P

Humans are diploid: cells have 2 homologous copies of each chromosome i.e. 2*23 chromosomes. Haploid cells only 23 chromosomes (sex-cells)

Page 34: Visualization of genomic data

Diploid organism - most mammals

A chromosome from mother

If the red strand is the plus-strand: C;T (or T;C but we write it alphabetical)If the green strand is the minus strand: G;A but we write it as G;A

A chromosome from father

An example of two homologous copies of ex chromosome 9 within a cell

Page 35: Visualization of genomic data

SNPs

Page 36: Visualization of genomic data

SNPs

Page 37: Visualization of genomic data

Exercise

1. Basic understanding of the graphics2. Effect of Single Nucleotide Polymorphisms (SNPs)3. Finding Orthologue genes4. Identify chromosomal locus for a gene