genomics 2011 lecture 2
Post on 06-May-2015
496 Views
Preview:
DESCRIPTION
TRANSCRIPT
C.elegans has 2 sexes, self fertilizing hermaphrodites and males.
Sex determined chromosomally - XX-hermaphrodite, X-male.
Diploid for 5 autosomes.
Standard classical genetic techniques can be applied.
Life cycle – Zygote to adult ~3 days.
Grow on petri dish – they eat bacteria.
Can store them frozen in liquid nitrogen indefinately.
C. elegans Genetics
Why might the hermaphrodite sex be useful for genetics?
bli-3
egl-30
mab-20
fog-1unc-73unc-57dpy-5
dpy-14fer-1
unc-29lin-11
unc-75
unc-101
glp-4
unc-54
Chromosome I
-15
-10
-5
0
5
10
15
20
25
Central cluster
Left arm
Right arm
m.u.
Genetic mapping.
m.u. = map unit.
Genetic mapping – recombination.
1 m.u. is 1% recombination per meiosis.
fog-1
glp-4
+
+ glp-4
+fog-1
+
Parent Recombinant
We want to understand how life works – at the molecular level.
We had mutant genes with informative phenotypes.
The mutated genes were mapped onto linkage groups – chromosomes.
What kinds of proteins do these genes encode and how do these proteins function?
In 1983, identifying the molecular sequence of a gene defined by mutation was a complicated and time consuming business, even in the worm.
If we only new the sequence of the genome!
As the term applies to recombinant DNA, what is a clone?
Cloned DNA insert
Vector
Starting with DNA extracted from any organism,
How can you take that and get one single fragment into a vector and grow billions of copies of that single “cloned” molecule?
C. elegans Genome Project
Identify DNA sequences corresponding to genes defined by mutation.
bli-3
egl-3
0
mab
-20
fog-
1un
c-73
dpy-
5
fer-
1lin
-11
unc-
75
unc-
101
glp-
4
unc-
54
-15
-10 -5 0 5 10 15 20 25
Genetic map
Chromosomes AACGTTCCACG.......
Cloned DNA fragments
Mutants - function
DNA sequence – genes and proteins
If you wanted to clone sections of chromosomes for sequencing, how many copies of each chromosome would you start with?
DNA
Of the order of millions – millions of copies of each chromosome
Purified genomic DNA
Fragment the chromosomal DNA – either restriction enzyme or mechanical shear.
Cosmid clones – ~ 40 Kb insert size – Genomic Library.
Cloning methods used by the C. elegans genome project
Cosmid cloning vector
Drug resistance markerE. coli origin of replicationcos siteUseful restriction sites
Linearised cosmid vector
Random fragments of genomic DNA – millions of them.
Long concatenates of cosmid vectors interspaced with random fragments of genomic DNA.
DNA Ligase
In vitro lambda packaging extracts
Lambda Terminase
Other phage proteins
COS sites in cosmid vector
Phage “transfects” single cosmid into an E. coli cell.
E. coli
Critical step
Mixed population “inserts”
Cells are plated onto medium with antibiotic selection.
Cells grown up to form bacterial colonies.
Each colony is derived from a single transfected cell.
Each colony is a clonal population.
Solid medium on plates Liquid culture
E. coli - clonal population with a single cosmid clone – single genomic DNA fragment.
Billions of copies of one cloned insert.Freeze it for storage.Purify cosmid DNA. Sequence the insert.Sub-clone fragments etc.
CLONING
Insert X
This is a clone
Started with many millions of different fragments of chromosomal DNA in one tube.
End up with potentially millions of CLONED fragments, each in a different E.coli colony – or culture.
We have got as far as random cloned fragments of genomic DNA.
What next?
Average cosmid insert size – 40 Kb
C.elegans genome ~100.3 Mb = 100,300 Kb
100,300/40 = 2,507.5
i.e. ~2,500 cosmid clones could contain the entire C. elegans genome – but WOULD they?
In principle, 2500 cosmid clones could contain all the DNA of the C. elegans genome.
Why not just start sequencing ~2500 clones picked at random?
Imagine this:
I give you a large and awkwardly shaped dice with 2500 faces, with a single number on each face, the numbers 1-2500.
Roll the dice and write down the number on top.Repeat this – again and again and…….
How many times would you have to roll the dice so that every face of the dice would have been on top at least once?
~ 4x2500 will give ~95% probability of any one side or DNA fragment, appearing.~10x2500 raises probability to ~99%
The Golden PathWhat if you could identify clones that overlapped slightly with ones another?
Cloned DNA fragments – moderate overlaps.
With this approach you could sequence the entire genome by sequencing less than 5000 cosmid clones (2x2500)
How can we get these clones?
Cosmid fingerprinting
1. Restriction digest of cosmid DNA.2. Separate fragments according to size by gel electrophoresis.3. Digitise the ladder of different sized DNA fragments obtained.
Multiple common fragments – clones probably overlap.
C. elegans genome project, ~17,000 cosmid clones fingerprinted.
Assembled into “contigs” – overlapping clones.A B C
AB
CD
“Contig” ~17,000 random cosmid clonesFingerprinting ~700 contigs
C.elegans genome 100 Mb~2,500 cosmid clones
700 contigs.
What is the minimum number of contigs the C. elegans genome could be contained in?
Or – how would we know when we had succeeded in joining all the contigs?
A method of filling the gaps – joining the contigs – was needed.
DNA inserts of ~100 kb – 2 Mb.
Grown in yeast.
Clonal growth of yeast colonies, much like cosmids in E. coli.
YAC DNA separated by pulsed-field gel electrophoresis.
YACs – Yeast Artificial Chromosomes
C. elegans genome is ~100 Mb.
Cosmid clones – approximately 40 kb inserts.YAC clones – select average 500 kb inserts.
~2500 cosmid clones would permit 1x coverage of the genome.~200 YAC clones would permit 1x coverage of the genome.
Cosmid clone contigs
bli-3
egl-3
0
mab
-20
fog-
1un
c-73
dpy-
5
fer-
1lin
-11
unc-
75
unc-
101
glp-
4
unc-
54
-15
-10 -5 0 5 10 15 20 25
Genetic map
6 Chromosomes AACGTTCCACG.......
? ?
~17,000 fingerprinted cosmid clones – ~700 unlinked contigs.
Joining up the contigs
~700 contigs – grids of representative cosmid clones. • Large YAC clones (> 1Mb).
• Purify YAC DNA – (PFGE).• Radio-label YAC DNA.• Hybridise to cosmid grid.• Expose to X-ray film.
Contig X Contig Y
YAC clone
Linked cosmid clones
A physical map of the genome - the “Golden Path” – chromosomes represented in ordered overlapping clones or “clone contigs”.
bli-3
egl-3
0
mab
-20
fog-
1un
c-73
dpy-
5
fer-
1lin
-11
unc-
75
unc-
101
glp-
4
unc-
54
-15
-10 -5 0 5 10 15 20 25
Genetic map
The Sequence of The Genome
YACs
Cosmids
Sequencing the C. elegans Genome
Individual cosmid clone.
Finishing – directed cloning to fill in any gaps.
Check for overlap of sequence with overlapping cosmids.
Randomly fragmented and shotgun cloned into sequencing vectors.Generally smaller insert size is best for primary sequence determination – 2-10 Kb.
Sequence of cosmid or YAC etc, determined and compiled in silico.
YAC clones covering most of the gaps.
YAC DNA shotgun cloned into M13 or plasmid vectors.
Most of the DNA contained in these awkward regions was successfully sub-cloned into small insert size vectors, and sequenced.
The sequence as published in December 1998 was generated from:2527 cosmids, 257 YACs, 113 fosmids, 44 PCR products.
Gaps between cosmid contigs ~20% of genome.
Most of these gaps were not random. They contained regions that could not be cloned in cosmids.
>CEK06A5acaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgttggaaggatgaaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcttctctcctcgttctctgctcacaactcgtctatcactcatatcacatttatttcccaatatcattttaacaacatcttccgatgcatgttcgtcaatattgcgcaaccactttgcaatattgtcaaaacttttcgcatttgtgatatcgtaaaccagcataattcccattgctccgcggtaatatgatgttgtgattgtgtggaatcgttcttgtccagctgtgtcccagatttgtaatttaatcttttttccttttaattcgatagttttaattttgaagtcgattcctgaatgaaaaaagaaaattattttgaaatcactagattctgaataaaaactaaccaatagttgagatgaatgtggtgttaaaggcatcatccgaaaatctgtacagaatgcaagtttttccaactcctgagtcgcctattagcagcaatttgaagagcatgtcatacggtcggcgagccatttttcttctgaaatgagaaaaagttgagaactaaagttgcacaaaagtaagagaaaagcacttgagtcatggcaaatagaacgaacactttgagatttcgaagaagttatcaagagttgacaattggaagatatttggaagaactttctaatttttttctagttttccaaaattaggtttttgtcataaaatgttgtcaaagaaaaaacaggacaaaatagttaattgttgtttccattataacaaaaaaaaatttgaacggagctattaacgcgtgcatgcgcaaatcacatcgattagctgtttctgggaaattctcgggaaaaggtgaacagcagctgctggcttcctctgcgggtcacgaaaacacaaagagatcattataattgttatttggaaaggaagcgaatctaaaacgggtacaggtggacgtttattgatcgaaagtgctttttatttgaaattgaatggtgaactttgcaattttgtaatgcaaagtacgttatcagatggcatgagatgtgtgaagtgataaggaataaaatgtgaacgacatgttcaagaaactgtgatttttcaataatttgtgatgaaatattttaggaacagaaatgaacatattaattgatataaaaacaataggaacactaactcataattatgataggtgaatatcaaaatgtgctagattttttgaagttaaaaaatacatttctaatattttttcaaataataagtttcagctgaaatttcagggtgatttcagaaagctatgttttgataaattgttttgaaaattaaaagaagctacagcaaaaaaaaattaaagagaacatcgctccctcgtagtgtataatttttgattatcgaaaaaaatgagtcaatgatgaaaaggaagtcgcaatctcaaaacttcaaaaatcaaaagaagccgttgcctctgtcatcaaaaattcagaagacaaggttgttgacaagggtcaattctcagtggtggagggcattgggcgtggtgaaatttttgaaggctagtgtggttggacctctactagatagacaaaacccccgaaatagacgtttaatttgatgagatggtggagaaagaaaaggactcattctctagatgatagagagaccagagatacagacaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgttggaaggatgaaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcatgtgtttttatgtttccggtgggagaaggttcaacaaaaaatgaaaagaaaaagttcaagcggcatgaatcattctgagtttaaaacaaaattattgcgaaaattaatattaaaaccttttcacaaaacttcaagctaatctgttcatgaaaatttgaataatagttttttcccacctatttagaattaacttcatattaacgaaattaattaacgaatcgaaaattatgacttttcagaatcatctgaagttttttcacattccatgctgcatggaataatttgatcctggaatcgatatgtttttatggtatactttttaaccttcaatttagctggaaaagtatggaataaataattcccgaagctatgtacatatatgtagaattattgaatgattgtgagaacaacttgactttagcttgagtaggaatcggaatggctatcgaccgatcaacacttaggattgtaagaatggcagtaagaatatattgaagaaagaatgtttgttcataggaagagaaagagtattgcgaaatcatcatcgcccactttagaatggacgggcggtgagcggacatagagaattgtgaatgactaatgcttttgcagaatctagggcaaaatcgtaggaacaaacaattgtaatacggagaaaacaatcatatcgatcgatgatcatggagaaaaatgtgatttaagtgagtagacttggaaaaattaataaaagcatgaattgtcgatatttttcatttattttcattataaagctctttaaaaacaaattaaatattgagaatggcttcgaagaatattgtttcaaatatgttcaatggtgacaccttgcggataaaattaatgtaaaaatcatggaacacagattcactgatatctcattatctcaagcagtgtaattagagattttttggaacaattattttataaaactataaataaaccgtttatactactcaaagccaaatattcaagctattaccattttttttctaactaattcttgagcaattaaagtattccccagtttttattttgcaacgactccaggcaaacacgctccgttgcacttgccgccaaggcgttgcattcaaatcagagagacatctcattccgatttctgtttttcttccaataaacggtattttatgcctaatgggtgatacggaaattgttcctcttcgagtacaaaatgtacttgatagcgaaatcattcgtctcaacttgtggtccatgaaggtaactgtctagtttttttaagttttcatgatttcaatatttttacagtttaacgcgaccagtttcaaactcgaaggttttgtgagaaatgaagaaggcactatgatgcagaaagtttgttccgaatttatttgtgtaagtcgagaaacatattcgtcaacaattttcattaaatattcagagacgcttcacttctacgttgcttttcgatgtttccggacgtttcttcgacttggtcggacagattgatcgggaatatcaacaaaaaatgggaatgcctagtagaattattgatgaattttcaaatggaattcctgaaaattgggccgaccttatctattcctgcatgtcagccaaccaaagaagcgcacttcgccctatccaacaggctccaaaagaaccaattagaactagaacagaaccaattgttacgttggcagatgaaaccgagctaactggaggatgccagaaaaattccgaaaacgagaaagaaaggaacagacgtgagcgtgaagaacagcaaacaaaggaacgtgagagaagattagaagaagaaaaacaacgacgagatgctgaagctgaggctgaaagaaggcgaaaagaagaggaagagctggaagaagctaattacacccttcgtgctccgaaatctcagaacggcgagccaatcactccgataaga
C. elegans cosmid K06A5, 24323 bp.Flat sequence file –3955 bp shown.
Genome sequence of C.elegans.
Sequence of entire genome.
Sequence of cDNA clones.
Approximately 19,500 predicted protein coding gene sequences.
Large number of various kinds of functional RNAs – not discuss further.
For this lecture – focus predicted proteins.
Gene prediction? How?
Science, December 1998.
Computer based predictions
GENEFINDER
Biases in coding sequence - in C. elegans non-coding is AT rich. Splice site signals, initiator methionines, termination codons.
Likely exons and probable/possible splice patterns.
• Evidence that a prediction is correct?• Homology with genes in other organisms – homologues.• Known protein families.
• Experimental evidence.
top related