genome browsers ucsc (santa cruz, california) and ensembl (ebi, uk)
Post on 21-Dec-2015
215 views
TRANSCRIPT
Genome Browsers
UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
http://www.ensembl.org/http://genome.ucsc.edu/
• Protein coding genes
• RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA)
• Structural DNA (centromeres, telomeres)
• Regulation-related sequences (promoters, enhancers, silencers, insulators)
• Parasite sequences (transposons)
• Pseudogenes (non-functional gene-like sequences)
• Simple sequence repeats
Eukaryotic Genomes: Not only collections of genes
• Blue: Prokaryotes
• Black: Unicellular eukaryotes
• Other colors: Multicellular eukaryotes (red = vertebrates)
Eukaryotic Genomes: High fraction non-coding DNA
Bron: Mattick, NRG, 2004
• 3 billion basepairs (3Gb)
• 22 chromosome pairs + X en Y chromosomes
• Chromosome length varies from ~50Mb to ~250Mb
• About 22000 protein-coding genes– compare with ~14000 for fruitfly en ~19000 for
Nematode C. elegans
Human Genome
Human genome
Bron: Molecular Biology of the Cell (4th edition) (Alberts et al., 2002)
• Only 1.2% codes for proteins, 3.5-5% is under selection• Long introns, short exons• Large spaces between genes• More than half exists of repetitive DNA
Variation Along Genome sequence
• Nucleotide usage varies along chromosomes– Protein coding regions tend to
have high GC levels
• Genes are not equally distributed across the chromosomes– Housekeeping generally in
gene-dense areas– Gene-poor areas tend to have
many tissue specific genes
Bron: Ensembl
Chromosome organisation
Bron: Lodish (4th edition)• DNA packed in chromatin
• Active genes in less dense chromatin (beads-on-a-string)
• Non-active genes often in densely packed chromatine (30-nm fiber)
• Gene regulation by changing chromatin density, methylation/acetylation of the histones
• Limited availability of chromatin information in genome browsers (post transcriptional modifications are currently under investigation with ChIP-on-chip experiments
Genomic elements
• Genome browsers can be used to examine other things
– Genomic sequence conservation
– Pseudogenes
– Duplications en deletions of pieces chromosome (Copy Number Variations, CNVs)
Genomic Sequence Conservation
• Not only protein coding parts are conserved in evolution
• Conserved non-coding genomic sequences can be involved in gene regulation (enhancers, silencers, insulators)
• With the UCSC browser one can examine genomic conservation
Pseudogenes
• Pseudogenes “look” like (are homologous to) protein-coding genes, but are non-functional
• Two types:– Unprocessed pseudogenes (loss of function)– Processed pseudogenes (mRNAs that are retrotranscribed onto
the genome they miss introns and sometimes have a polyA)
• The UCSC contains various databases of pseudogenes:– Yale pseudogenes (both types pseudogenes)– Vega pseudogenes (both types pseudogenes)– Retroposed genes (only processed pseudogenes)
Copy Number Variation
• People do not only vary at the nucleotide level (SNPs); short pieces genome can be present in varying number of copies (Copy Number Polymorphisms (CNPs) or Copy Number Variants (CNVs)
• When there are genes in the CNV areas, this can lead to variations in the number of gene copies between individuals
• With the UCSC browser CNVs can be examined
Single Nucleotide Polymorphisms (SNPs)
• Sequence variations within a species
• Similar to mutations, but are simultaneously present in the population, and generaly have little effect
• Are being used as genetic markers (a genetic disease is e.g. associated with a SNP)
• ENSEMBL offers a nice SNP view
UCSC vs Ensembl: Which is better ?
• They more or less contain the same information
• UCSC is a bit easier in use
• Ensembl gives more detailed information and more flexible data export
• Other small differences in data (e.g. UCSC has more extensive genomic conservation data)
• Whatever your are familiar with !!