major insights from the hgp on

17
Major insights from the HGP on Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & 875-914. 1)Gene content 2)Proteome content 3)SNP identification 4)Distribution of GC content 5)CpG islands 6)Recombination rates 7)Repeat content

Upload: hector

Post on 05-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Major insights from the HGP on. Gene content Proteome content SNP identification Distribution of GC content CpG islands Recombination rates Repeat content. Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & 875-914. 1) Gene content. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Major insights from the HGP on

Major insights from the HGP on

Nature (2001) 15th Feb Vol 409 special issue; pgs 814 & 875-914.

1)Gene content

2)Proteome content

3)SNP identification

4)Distribution of GC content

5)CpG islands

6)Recombination rates

7)Repeat content

Page 2: Major insights from the HGP on

1) Gene content

30 - 40,000 protein-coding genes estimated based on known genes and predictions

IHGSC Celeradefinite genes 24,500 26,383 possible genes 5000 12,000

Genes encode either protein or noncoding RNAs

rRNA, tRNA, snRNA, snoRNANature (2001) 15th Feb Vol 409 special issue; pg 814-816 and 860-914.

Page 3: Major insights from the HGP on

More genes: Twice as many as drosophila / C.elegans

Uneven gene distribution: Gene-rich and gene-poor regions

More paralogs: some gene families have extended the number of paralogs e.g. olfactory gene family has 1000 genes

More alternative transcripts: Increased RNA splice variants produced thereby expanding the primary proteins by 5 fold (e.g. neurexin genes)

Nature (2001) 409: pp 892

Gene content….

Page 4: Major insights from the HGP on

Gene-rich E.g. MHC on chromosome 6 has 60

genes with a GC content of 54%

Gene-poor regions 82 gene deserts identified? Large or unidentified genes

What is the functional significance of these variations?

Uneven gene distribution

Genetics by Hartwell: pp 341-347

Gene content

Page 5: Major insights from the HGP on

2) Proteome content proteome more complex than invertebrates

Nature (2001) 15th Feb Vol 409 special issue; pg 847

Protein Domains (sections with identifiable shape/function)

Domain arrangements in humanslargest total number of domains is 130largest number of domain types per protein is 9Mostly identical arrangement of domains

A A B B CB C C CC Protein X

Page 6: Major insights from the HGP on

proteome more complex than invertebrates……

Nature (2001) 15th Feb Vol 409 special issue; pg 847

no huge difference in domain number in humansBUT, frequency of domain sharing very high in human proteins (structural proteins and proteins involved in signal transduction and immune function)

However, only 3 cases where a combination of 3 domain types shared by human & yeast proteins.

e.g carbomyl-phosphate synthase (involved in the first 3 steps of de novo pyrimidine biosynthesis) has 7 domain types, which occurs once in human and yeast but twice in drosophila

2) Proteome content….

Page 7: Major insights from the HGP on

3) SNPs (single nucleotide polymorphisms)

More than 1.4million SNPs identified

One every 1.9kb length on averageDensities vary over regions and

chromosomes

e.g. HLA region has a high SNP density, reflecting maintenance of diverse haplotypes over many millions of years

Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928

Page 8: Major insights from the HGP on

How does one distinguish sequence errors from polymorphisms?sequence errorsEach piece of genome sequenced at least 10

times to reduce error rate (0.01%)

PolymorphismsSequence variation between individuals is 0.1%

To be defined as a polymorphism, the altered sequence must be present in a significant population

Rate of polymorphism in diploid human genome is about 1 in 500 bp

Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928

Page 9: Major insights from the HGP on

3) SNPs……

Sites that result from point mutations in individual base pairs

biallelic ~60,000 SNPs lie within exons and

untranslated regions (85% of exons lie within 5kb of a SNP)

May or may not affect the ORF Most SNPs may be regulatory

Nature (2001) 15th Feb Vol 409 special issue; pg 821 & 928

http://www.genetics.gsk.com/kids/medicine01.htm

Page 10: Major insights from the HGP on

3) SNPs……and disease

Page 11: Major insights from the HGP on

3) SNPs……and risk of disease

Page 12: Major insights from the HGP on

3) SNPs……and drug prescription

Page 13: Major insights from the HGP on

4) Distribution of GC content

Genome wide average of 41%Huge regional variations exist

E.g.distal 48Mb of chromosome 1p-47% but chromosome 13 has only 36%

Confirms cytogenetic staining with G-bands (Giemsa)dark G-bands – low GC content (37%)light G-bands – high GC content (45%)

Nature (2001) 15th Feb Vol 409 special issue; pg 876-877

Page 14: Major insights from the HGP on

5) CpG islands

Significance of CpG islands1) Non-methylated CpG islands

associated with the 5’ ends of genes2) Aberrant methylation of CpG islands

is one mechanism of inactivating tumor suppressor genes (TSGs) in neoplasia

http://www.sanger.ac.uk/HGP/cgi.shtml

CpG Methyl CpG TpG

methylated at C Deamination

CpG islands show no methylation

Page 15: Major insights from the HGP on

CpG islands

Greatly under-represented in human genome

• ~28,890 in number• Variable density

e.g. Y – 2.9/Mb but 16,17 & 22 have 19-22/MbAverage is 10.5/Mb

Nature (2001) 15th Feb Vol 409 special issue; pg 877-888

Page 16: Major insights from the HGP on

6) Recombination rates

2 main observations• Recombination rate increases with

decreasing arm length• Recombination rate suppressed

near the centromeres and increases towards the distal 20-35Mb

Page 17: Major insights from the HGP on

7) Repeat content

a) Age distribution

b) Comparison with other genomes

c) Variation in distribution of repeats

d) Distribution by GC content

e) Y chromosome

Nature (2001) 409: pp 881-891