human genome sequence and variability

Post on 05-Jan-2016

27 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Human Genome Sequence and Variability. Gabor T. Marth, D.Sc. Department of Biology, Boston College marth@bc.edu. Medical Genomics Course – Debrecen, Hungary, May 2006. Lecture overview. 1. Genome sequencing strategies, sequencing informatics. - PowerPoint PPT Presentation

TRANSCRIPT

Human Genome Sequence

and Variability

Gabor T. Marth, D.Sc.

Department of Biology, Boston Collegemarth@bc.edu

Medical Genomics Course – Debrecen, Hungary, May 2006

Lecture overview

1. Genome sequencing strategies, sequencing informatics

2. Genome annotation, functional and structural features in the human genome

3. Genome variability, DNA nucleotide, structural, and epigenetic variations

1. The Human genome sequence

The nuclear genome (chromosomes)

The genome sequence

• the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)

Completed genomes

~1 Mb~100 Mb

>100 Mb

~3,000 Mb

Main genome sequencing strategies

Clone-based shotgun sequencing

Whole-genome shotgun sequencing

Human Genome Project Celera Genomics, Inc.

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Clone mapping – “sequence ready” map

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Shotgun subclone library construction

BAC primary clone cloning vector

sequencing vector

subclone insert

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Sequencing

Robotic automation

Lander et al. Nature 2001

Base calling

PHREDbase = AQ = 40

Vector clipping

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Sequence assembly

PHRAP

Repetitive DNA may confuse assembly

Sequence completion (finishing)

CONSED, AUTOFINIS

H

gapregion of low sequence coverage and/or quality

2. Human genome annotation

Genome annotation – Goals

protein coding genes RNA genesrepetitive elements

GC content

The starting material

AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAGTCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

Coding genes – ab initio predictions

ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA

Open Reading Frame = ORF

Stop codonStart codon

PolyA signal

Ab initio predictions

Gene structure

Ab initio predictions

…AGAATAGGGCGCGTACCTTCCAACGAAGACTGGG…

splice donor site splice acceptor site

Ab initio predictions

GenscanGrailGenieGeneFinderGlimmeretc…

EST_genomeSim4SpideyEXALIN

Homology based predictions

ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA

ACGGAAGTCT

known coding sequence from another organism

GGACTATAAA

expressed sequence

genes predicted by homology

GenomescanTwinscanetc…

Consolidation – gene prediction systems

Otto

Ensembl

FgenesH

Genscan

Grail

Genewise

Sim4 dbEst

ncRNA genes

prediction based on structure (e.g. tRNAs)

for other novel ncRNAs, only homology-based predictions have been successful

Repeat annotations

Repeat annotation are based on sequence similarity to known repetitive elements in a repeat sequence library

The landscape of the human genome

Gene annotations – # of coding genes

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Gene annotations – gene length

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Gene annotations – gene function

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

GC content and coding potential

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

ncRNAs

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Segmental duplications

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Repeat elements

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Genes and repeats

Physical vs. genetic map (Mb/cM)

0.4 cM 1.3 cM 0.7 cM

0.4 Mb 0.7 Mb 0.3 Mb

3. Human genome variability

DNA sequence variations

• the reference Human genome sequence is 99.9% common to each human being

• sequence variations make our genetic makeup unique

SNP

• the most abundant human variations are single-nucleotide polymorphisms (SNPs) – 10 million SNPs are currently known

DNA sequence variations

insertion-deletion (INDEL) polymorphisms

Structural variations

Speicher & Carter, NRG 2005

Structural variations

Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767

Detection of structural variants

Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767

Epigenetic changes: chromatin structure

Sproul, NRG 2005

Epigenetic changes: DNA methylation

Laird, NRC 2003

top related