1 genomics class: molecular biology, gibms 2004 source: “molecular biology” by robert f. weaver...

Post on 21-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Genomics

Class: Molecular Biology, GIBMS 2004 Source: “Molecular Biology” by Robert F. Weaver

2nd Edition, McGraw Hill Publishing, 2002

2

Subjects To Be Covered

Sequencing of GenomesSequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

3

Sequencing of Genomes

1977; Fred Sanger; X 174 bacteriophage; 5,375 ntConcept of ORF as coding regionAmino acid sequence of phage proteinsOverlapping genes [Figure 24-1] only in viruses

1995; Craig Venter & Hamilton Smith;Haemophilus influenzae (1,830,137 nt) (1st free living)Mycoplasma genitalium (smallest free-living, 580,000 nt; 470 genes)

1996; Saccharomyces cerevisiae; (1st eukaryote) 12,068,000 nt1997; Escherichia coli; 4,639,221 nt; Genetically more importantMany firsts followed1999; Human chromosome 22; 53,000,000 nt2000; Drosophila melanogaster; 180,000,000 nt2001; Human; Working draft; 3,200,000,000 nt

4

5

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

6

Sequencing of GenomesHuman Genome Project

International project

Controversial: proposed in 1990Sizes and costs (500,000 pages just to print, time to read them?)Social implications More so

ApproachesSystematic and conservative; Francis Collins; expected done by 20051998; Craig Venter; Celera (VitaGenomics Taiwan); by 2000 using shotgun sequencing needs powerful computer

Rough drafts of Human GenomeAnnounced June 26, 2000; 3,200,000,000 nt; 85%-99% complete

7

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

8

Sequencing of GenomesVectors for Large-Scale Genome Project

Vectors needed: Yeast & bacterial artificial chromosomes

Cloning capacity; cosmid ~50Kb

Yeast artificial chromosomes (YAC) [Fig. 24-2]Large capacity & self replicating1,000,000+ nt capacityInefficient; Isolation; Unstable (linear); Cryptic

Bacterial artificial chromosome (BAC) [Fig. 24-3]Based on F and F’ plasmids that conjugate between bacterial cellsMobilize the whole host chromosome after insertion between cells300,000 nt capacity

9

10

Constructed in 19Constructed in 1992MCS: Multiple Cloning Site for cloningCmR for selection

11

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

12

Sequencing of GenomesThe Clone-by-Clone Strategy

Mapping (genetically & physically) the whole genomeUse overlapping clones Clone-by-Clone sequencing strategyLooking for “flag posts”

Tools for mapping of genes:Restriction Fragment Length Polymorphisms (RFLPs) [Fig. 24-4]

Use to determine the position/location of a gene or a stretch of DNAHow to look for RFLPs?

Variable Number of Tandem Repeats (VNTRs)Repeated sequences in tandem derived from minisatellites

Sequence Tagged Sites (STSs) [Fig. 24-5]Short (60-1000 bp) sequences detectable by PCR

Microsatellites: repeats of very short sequencesHighly polymorphic, thus genetic mapping is possibleUseful in physical mapping or locating specific sequence in the genome

13

2 individuals are polymorphic with respect to a HindIII site (in red)

14

Primers for PCR were designed from sequences of small areas of DNA that were already known

15

Sequencing of GenomesThe Clone-by-Clone Strategy

Tools for gene mapping: landmarks that relate to gene positions

Construction of physical map with sequencing dataMapping with STSs [Fig. 24-6]Very laborious due to the sizes of the BACsRadiation Hybrid Mapping

Ionizing radiation to create chromosome fragmentsForm hybrid cells with hamster cellsExamine individually cloned cells

For mapping human chromosomesA set of landmarks or signposts are needed and thus used to relate the positions of genes1998 STS-based maps constructed that included 30,000+ genes

16

After a number of positive BACs, one can begin mapping by screening these BACs for STSs in sequential manner

17

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

18

Sequencing of GenomesShotgun Sequencing

The shotgun sequencing strategy [Fig. 24-7]

Directly to sequencing without mapping1996; Craig Venter, Hamilton Smith, Leroy Hood500 nt/end x 300,000 BAC clones = 300 million nts = 10% total human genome500 nt sequenced are dispersed around every 5,000 kbActed as sequence-tagged connector (STC) for each BAC cloneEach of the 300,000 clones connects via STC to 30 other clones

Fingerprinting of each clones

“BAC walking”

19

<1> BAC library<2> Plasmid library<3> Fingerprinting<4> BAC walking Powerful computer

20

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

21

Sequencing of GenomesProgress in Sequencing the Human Genome

Progress: Working draft: 90% complete with 1% errorFinal draft: as complete as possible with less than 0.01% error (1 in 10,000)

“Functionally complete”33,464,000 of the 34,491,000 nt (97.02%) were sequencedError rate at 1 per 50,000 nt Primarily the 22q

1999; Final draft of human chromosome 222000; Final draft of human chromosome 212001; Working draft of whole human chromosomes

What do we learned from chromosome 22?<1> still contains 11 gaps of “unclonable” and “unsequenceable” DNA<2> 800 genes (679 known, related & pseudogenes, 100 predicted, 225 unknown)<3> exons account for 3% of total length<4> recombination rates vary along the chromosome [Fig. 24-8]<5> local and long-range duplications<6> large regions of 22q are conserved in mouse [Fig. 24-9]

22

23

24

Sequencing of GenomesProgress in Sequencing the Human Genome

1999; Final draft of human chromosome 22

2000; Final draft of human chromosome 21Involved in Down’s Syndrome (trisomy 21)Primarily from 21q, with minors from 21pA total of 33,500,000+ nt were sequenced (99.7% of total length)Gaps (3) also present that no sequences are availableRelatively low gene density; 225 identified genes (127 known, 98 predicted)Total number of genes estimated in human:

40,000 genes (based on chromosomes 21 & 22)30,000 genes (working draft of whole chromosomes)

Large regions of conservation between human and mouse chromosomesIdentity of gene(s) responsible for Down’s Syndrome still unknown

2001; Working draft of whole human chromosomes

25

Sequencing of GenomesProgress in Sequencing the Human Genome

1999; Final draft of human chromosome 222000; Final draft of human chromosome 212001; Working draft of whole human chromosomes

2.9 billion (Venter et al) to 3.2 billion (Collins et al) ntGaps and inaccuracies, but nevertheless, extremely informative25,000–40,000 genes (another 12,000 possible genes)

Only 2x more than fruit fliesOrganisms complexity not proportional to gene numbers

Expression of human genome is more complexAlternative splicing? 40% of genesPost-translational modifications?

Source of human genes: importation (from bacteria?)About 50% human genome came from transposon action

all known transposons in human are inactive now

26

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsGenomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

27

Genomics and Its Applications

Structure genomicssequencing data

What can we use the genomic DNA sequences for?

Applications:Study the expression of large number of genes

“Functional Genomics”Finding/Identify the functions of genes, especially in diseases

“Positional Cloning”Others

28

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

29

Genomics and Its ApplicationsTechniques in Functional Genomics

Blotting analysis in the past/Miniaturized the blotting analysisin order to study the pattern of expression of genes

DNA microarray0.25-1 nL (billionth of a liter) per spot [Fig. 24-10]

5,808 DNA spots/microscope slide DNA microchipsSynthesize oligonucleotides directly on glass chips [Fig. 24-11]

Oligonucleotide arrayHow long must a nucleotide be to uniquely identify a human gene in a mixture of all other human genes?

Hybridization analysis on DNA chip [Fig. 24-12]300,000 oligonucleotides in a 0.5” X 0.5” glass areaExpressing of every and all yeast gene at the same time has been determined

Serial analysis of gene expression (SAGE) [Fig. 24-13]Short cDNAs (tags) are synthesized from all mRNAs in a cellTags are linked together in clones, sequenced to determine the nature (expression) of them

30

1” X 3” glass microscopic slide with 5,808 tiny spots of DNA

31

Circle: reactive groupsRed: photosensitive blocking agentBlue: masking agent

32

Serum-starved: green (#3)Serum-stimulated: red (#2, #4)

33

34

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

35

Genomics and Its ApplicationsPositional Cloning

Before genomic era

Positional cloning is used to look for a gene responsible for a disease without knowing the function of its protein product to locate a gene responsible for a disease on the chromosome

Strategies of positional cloningObtain markers closely linked to the diseaseScan regions between markers and possible genes

Search for exons with “exon traps” techniqueLocate “CpG islands” that tend to associate with genesOther tools

Human Genome Project made the scanning much easier

36

Genomics and Its ApplicationsPositional Cloning

“exon traps” or “exon amplification” technique [Fig. 24-14]Look for ORFs?

More efficiently with “exon traps” techniqueVector contains chimeric gene under SV40 promoter controlLook for exons in amplified products after cloning of cDNAAll exons or ORFs contain splice sites and thus survive propagation in cells

Locate “CpG islands”Active human genes tend to associate with unmethylated CpGInactive human genes are mostly methylated CpGHpaII recognizes only unmethylated CCGG

HpaII will only cut active genes

37

38

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

39

Genomics and Its ApplicationsApplications of Functional Genomics

Huntington’s Disease (“HD”)Progressive nerve disorder:emotional disturbances & adventitious movements

Single dominant gene with linked RFLP identified [Fig. 24-15]Two (2) polymorphic sites were present in affected families

Four (4) haplotypes or haploid genotypes were possible [Fig. 24-16]

Which haplotype is associated with the Hungtington’s Disease? [Fig. 24-17]

Answer: Haplotype “C” (those with both HindIII sites) is strongly

associated with the disease

However, this haplotype association varies with families

RFLP can be used as a genetic marker, just like a gene

“HD” gene was mapped to a region on chromosome 4 with repeats of CAGNormal individuals: 11-34 “CAG” repeats (98% has less than 24 repeats)

Affected patients: >42 “CAG” repeats

Cystic fibrosis (“CF”)

40

4 haplotypes (A, B, C, D) result from the combinations of the presence or absence of the 2 HindIII sites

41

Haplotype Site 1 Site 2 FragmentsA Absent Present 17.5; 3.7; 1.2B Absent Absent 17.5; 4.9C Present Present 15.0; 3.7; 1.2D Present Absent 15.0; 4.9

42

<1> Most individuals with the “C” haplotype already have the disease<2> No disease sufferers lack the “C” haplotype

43

Genomics and Its ApplicationsApplications of Functional Genomics

Huntington’s Disease (“HD”)“HD” gene was located to a region near the end of human chromosome 4Identification of “HD” gene:

Number of “CAG” repeats of a putative gene Normal: ranged from 11 to 34; 98% had <24 Diseased: all have >42, and up to 100

Perspective studies using animal (mouse) modelApplications:

Genetic screening of potential patientsGene therapy? Normal function of “HD” gene (“huntingtin”) How the expansion of “CAG” repeats causes disease

extra glutamines in “huntingtin” protein?

Cystic fibrosis (“CF”)

44

Genomics and Its ApplicationsApplications of Functional Genomics

Huntington’s Disease (“HD”)

Cystic fibrosis (“CF”)Most common “lethal” genetic disease affects Caucasian peopleAutosomal-recessive mutation; carrier rate is 1/20Affected secretory epithelia of 1/1,600 live birthsAccumulation of mucus infectionsLinkage to known markers was established on 7q31Positional cloning & “chromosome walking” were followed [Fig. 24-18]Unclonable region“Chromosomal jumping” (over unclonable regions) [Fig. 24-19]“CF” gene spans 250Kb of DNA and includes at least 24 exons

45

46

47

Genomics and Its ApplicationsApplications of Functional Genomics

Huntington’s Disease (“HD”)Cystic fibrosis (“CF”)

Identification & authentication of “CF” gene<1> expressed in all tissues affected by CF<2> gene product contains membrane-spanning domain

regulates channel of ions across the membraneCFTR: Cystic fibrosis transmembrane conductance regulator

<3> most CF patients have a 3-bp deletion in “CFTR” genea “phenylalanine” is missing

Applications: Transgenic animal modelApplications: Gene therapy; CFTR protein as drug

48

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

49

Genomics and Its ApplicationsOther Applications

Post-genomic era:

Single Nucleotide Polymorphisms (SNPs)SNPs could link to human diseasesAssociations with:

polygenic traits, such as intelligenceresponses to drugs pharmacogenomics

Vast majority of SNPs locate outside genesSimilarities and differences between RFLPs and SNPs in human

Testing of functions of each & every genes in microorganismsintentional and targeted mutation

Protein-protein interactions and activities of gene productsyeast two-hybrid system

50

Subjects To Be Covered

Sequencing of GenomesThe human genome projectVectors of large scale genome projectsThe clone-by-clone strategyShotgun sequencingProgress in sequencing human genome

Genomics and Its ApplicationsTechniques in functional genomicsPositional cloningApplications of functional genomicsOther applicationsBioinformatics and proteomics

51

Genomics and Its ApplicationsBioinformatics & Proteomics

To access, analyze and interpret sequences in databases

Bioinformatics:Combines biology & computerized data processing knowledgeBuilding and manipulating biological database

ProteomicsGene genome, genomicsTranscripts transcriptome, transcriptomics

Protein proteome, proteomicsSeparation of proteins: 2-D P.A.G.E

Analysis of proteins: mass spectrometry [Fig. 24-20]Protein (antibody) microchips

52

Matrix-assisted laser desorption-ionization time-of-flight(MALDI-TOF) mass spectrometry

top related