today
DESCRIPTION
Today. Please read… S cience 291: 1304-1315. Human Genome Project Dissenters My Brush with Greatness?. 1992 : Two years into the HGP, two of the projects biggest critics were… - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/1.jpg)
Today
• Please read…
Science 291: 1304-1315
![Page 2: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/2.jpg)
Human Genome Project DissentersMy Brush with Greatness?
• 1992: Two years into the HGP, two of the projects biggest critics were…
– Sydney Brenner: believed that the HGP should focus on human EST collections, and sequence the genome of a simple vertebrate (Fugu).
– Craig Venter: believed that the clone-by-clone approach was not the most efficient way to proceed, suggested that shotgun approaches, and even a whole genome approach was feasible.
…they were both right.
![Page 3: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/3.jpg)
Sydney Brenner
2002 Nobel Prize (Medicine/Physiology)
Sydney Brenner and John E. Sulston, Britain
H. Robert Horvitz, United States
– for discoveries concerning how genes regulate organ
development and a process of programmed cell death.
![Page 4: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/4.jpg)
End sequenced cDNAs(complementary DNA)
Expressed Sequence TagsESTs
cDNA: synthetic DNA transcribed from a mRNA template,
– through the action of an RNA dependant DNA polymerase called reverse transcriptase.
Online Primer: est.html
Brenner was right….
![Page 5: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/5.jpg)
Still Sequencing cDNAs,
- first and easiest look into any genome,
- useful in understanding genomic sequence (gene finding),
- helps determine splice site variants,
- shorter than genomic clones, fits in plasmids,
- etc.
![Page 6: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/6.jpg)
…tissue specific ESTs are very useful.
Used for microarrays…
…an array of DNA that can be hybridized with probes to study patterns of gene expression.
![Page 7: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/7.jpg)
Whole Genome Assembly• 1995: 1.8 Mbp Haemophilus influenza genome sequenced,
• 1996 - on : Mycoplasma, E. coli and others*,
• 1999: Chromosome 2 of Arabidopsis,
• 2000: Drosophila (120 Mbp) genome,
…Human, Mosquito, etc…
• Lots of genomes, several applications...
*WGA of bacterial, viral populations...
Venter was right….
J. Craig Venter
![Page 8: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/8.jpg)
![Page 9: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/9.jpg)
• 1 year, 120 megabases,
• Assembly algorithms could generate accurate genomic sequences,
• Interim assemblies (or mapping) were not necessary.
24 MARCH 2000 VOL 287 SCIENCE
![Page 10: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/10.jpg)
Big Biology
![Page 11: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/11.jpg)
Think About This…
…the plasmid library construction is the first critical step in WGA sequencing,
– “if the DNA libraries are not uniform in size, non-chimeric, and do not randomly represent the genome, then the subsequent steps cannot accurately reconstruct the genome sequence.”
– “We used automated high-throughput DNA sequencing and the computational infrastructure to enable efficient tracking of enormous amounts of sequence information (27.3 million sequence reads; 14.9 billion bp of sequence).”
![Page 12: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/12.jpg)
Who’s DNA?
• 21 enrolled donors,
– age, sex, ethnographic group,
– one African-American,
– one Asian-Chinese,
– one Hispanic-Mexican,
– two Caucasions*.
![Page 13: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/13.jpg)
Who’s Mostly?
J. Craig Venter
![Page 14: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/14.jpg)
![Page 15: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/15.jpg)
8, September 1999 - 25, June 2000 543 bp average sequence read
…back to humans…
What to know?Individuals,Libraries,
Sequence coverage,Clone coverage,Other?
![Page 16: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/16.jpg)
![Page 17: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/17.jpg)
WGA Outline
Online Primer:snps.html
![Page 18: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/18.jpg)
5’- actgtacgtgtagctgaca… - 3’ 5’- tagcgtagttattttgc… - 3’
=
sequenced ends~543 bp
unsequenced insert~ known size
=
5’- actgtacgtgtagctgaca
actgtacgtgtagctgaca - 3’
insert
vector
sequencing primersDNA in sized libraries…
DNA sequence in mate-pairs…cartoons
![Page 19: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/19.jpg)
8, September 1999 - 25, June 2000 543 bp average sequence read
…back to humans…
What to know?Individuals,Libraries,
Sequence coverage,Clone coverage,Other?
![Page 20: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/20.jpg)
Whole Genome Assembly
What does Shredder Do?Why?
1. Screener
2. Overlapper
3. Unitigger/Discriminator,
4. Scaffolder,
5. Repeat Resolver.
![Page 21: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/21.jpg)
Screener
...finds and “masks” microsatellite repeats, known repeated regions and ribosomal DNA, etc.
– “masked” regions not used to make contigs,
– “marks” the rest for overlapping.
atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga
read:
atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga
masked:
atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga
marked:
![Page 22: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/22.jpg)
Overlapper
...looks for end-to end overlaps of at least 40 bp with no more than 6% differences in match,
What’s the significance? ...a one in 1017 event.
<--tactgtacgtagctgtgatgttcctcggatatagcgggcatatttattacgctattgtacgtgt-3’
5’- gttcctcggatatagcgggcatatttattacgctattgtacgtgtaaagtatcgt-->
> 40 bp, < 6% mismatch
…given perfect randomness.
![Page 23: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/23.jpg)
Good News
... uniquely assembled contigs (unitigs) are readily identifiable,
– all of the assembled sequences match over all of the known sequence,
- and -
...are consistent with an 8x sequence coverage.
![Page 24: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/24.jpg)
Whole Genome Assembly
What does Shredder Do?Why?
1. Screener
2. Overlapper
3. Unitigger/Discriminator,
4. Scaffolder,
5. Repeat Resolver.
![Page 25: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/25.jpg)
Unitigs
...contig cluster is consistent with expected size (+8),
...no dissimilar sequences between any members.
...the Screener doesn’t include all of the “low frequency” level repeats,
...so, a majority of the Overlapper outputs turned out to be bogus.
But(t):
![Page 26: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/26.jpg)
What Now?
– “over-collapsed” assemblies are identified and broken down into unitigs when possible...
– …these “too-large” contig sets are sent to the Unitigger/Discriminator.
![Page 27: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/27.jpg)
...over-collapsed.
...in a world where real data matches expected data, each locus would have 8X coverage,
...if there are genomic repeats, then sequences would be “over-represented”, on average, 8 more per repeat, per contig.
Unitigger...differentiates between a true overlap, and an overlap that includes more
than one loci.
![Page 28: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/28.jpg)
Discriminator
...parses the “over-collapsed” contig by using sequence outside of the overlap region
![Page 29: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/29.jpg)
Discriminator
...may yield u-unitigs.
Unitigger/Discriminator Output: correctly assembled contigs covering 73.6% of the genome.
![Page 30: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/30.jpg)
Scaffolder
...contigs the contigs,
– uses mate-pair information, two or more consistent mate-pair matches yields 1 in 1010 odds of being chance.
![Page 31: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/31.jpg)
Repeat Resolver ...most of the remaining gaps were due to repeats.
“Rocks”
Use “low Discriminator Value” contig sets to fill gaps,
- find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 107),
“Stones”
- find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences.
confirm matches
![Page 32: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/32.jpg)
Repeat Resolver ...most of the remaining gaps were due to repeats.
“Rocks”
Use “low Discriminator Value” contig sets to fill gaps,
- find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 107),
“Stones”
- find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences.
![Page 33: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/33.jpg)
If that Doesn’t Work
...find a mate-pair that spans the gap, and sequence it,
Sequence Walking
...make sequencing primer from BES...
![Page 34: Today](https://reader033.vdocument.in/reader033/viewer/2022051820/56813c7f550346895da620c4/html5/thumbnails/34.jpg)
Wednesday
• Questions about WGA,
• CSA,
• Comparisons,
• Quality Control, etc.