high throughput genomic dna sequencing and bioinformatics

Click here to load reader

Download High Throughput Genomic DNA Sequencing and Bioinformatics

Post on 19-Dec-2015




2 download

Embed Size (px)


  • Slide 1
  • High Throughput Genomic DNA Sequencing and Bioinformatics
  • Slide 2
  • The Human Genome Project The Human genome is now officially sequenced. That was a big job. How did they do it? Is there anything that a knowledge of bioinformatics tells us that we should watch out for in the human genome sequence?
  • Slide 3
  • What is DNA Sequencing? A DNA sequence is the order of the bases on one strand. By convention, we order the DNA sequence from 5 to 3, from left to right. Often, only one strand of the DNA sequence is written, but usually both strands have been sequenced as a check.
  • Slide 4
  • DNA Sequencing was Awarded the Nobel Prize Walter Gilbert and Fred Sanger were awarded the Nobel Prize in Chemistry for the development of two different methods of DNA sequencing. http://www.nobel.se/chemistry/laureat es/1980/http://www.nobel.se/chemistry/laureat es/1980/ (Oh yes, and Paul Berg for Recombinant DNA- a big year!)
  • Slide 5
  • Two Methods of DNA Sequencing Maxam - Gilbert Method, in which a DNA sequence is end-labeled with [P-32] phosphate and chemically cleaved to leave a signature pattern of bands. Sanger Method, in which a DNA sequence is annealed to an oligonucleotide primer, which is then extended by DNA polymerase using a mixture of dNTP and ddNTP (chain terminating) substrates. This is the main method used now.
  • Slide 6
  • Sanger Method is a Form of DNA Synthesis DNA to be sequenced acts as a template for the enzymatic synthesis of new DNA strand starting at a defined primer. Polymerases used are Pol I type polymerases. Incorporation of a dideoxynucleotide blocks further synthesis of the new DNA strand.
  • Slide 7
  • Remember the Rules of In Vivo DNA Replication
  • Slide 8
  • Slide 9
  • How the Reaction Works If the DNA is double stranded, the reaction is started by heating until the two strands of DNA separate. Lower the temperature and the primer sticks to its intended location by H bonds. DNA polymerase starts elongating the primer. If allowed to go to completion, a new strand of DNA would be the result.
  • Slide 10
  • How the Reaction Works If we start with a billion identical pieces of template DNA, we'll get a billion new copies of one of its strands. We run the reactions, however, in the presence of a dideoxyribonucleotide. This is just like regular DNA, except it has no 3' hydroxyl group - once it's added to the end of a DNA strand, there's no way to continue elongating it.
  • Slide 11
  • Slide 12
  • Slide 13
  • Original Sanger Sequencing A mixture of dNTPs and a single ddNTP is used in the reaction tubes. We can start with 4 different reaction tubes, each with all four dNTPS (dATP, dGTP, dTTP, dCTP) and ONLY one of either ddA, ddC, ddG and ddT (only 1%). The key is MOST of the nucleotides are regular ones, and just a fraction of them are dideoxynucleotides.
  • Slide 14
  • An Example of a T tube: MOST of the time when a 'T' is required to make the new strand, the enzyme will get a good one and it continues to elongate. MOST of the time after adding a T, the enzyme will go ahead and add more nucleotides. However, about 1% of the time, the enzyme will get a dideoxy-T, and that strand can never again be elongated. It eventually breaks away from the enzyme, leaving a dead end DNA that cant be further extended.
  • Slide 15
  • Original Sanger Sequencing Sooner or later ALL of the copies will get terminated by a T. But each time the enzyme makes a new strand, the place it gets stopped will be random. In millions of starts, there will be strands stopping at every possible T along the way.
  • Slide 16
  • Specific Primers Start the Sequence ALL of the strands we make started at one exact position. ALL of them end with a T. There are billions of them... many millions at each possible T position. To find out where all the T's are in our newly synthesized strand, all we have to do is find out the sizes of all the terminated products!
  • Slide 17
  • Slide 18
  • Non-Radioactive DNA Labels Add a chemical tag to each ddNTP that can emit a fluorescent color when excited by a laser. We can add a different dye to each ddNTP and each is excited by a different laser wave length. Run the reactions in only one tube, not 4 tubes! This is easier and faster. A big contribution to high throughput sequencing.
  • Slide 19
  • Automated DNA Sequencing We don't even have to 'read' the sequence from the gel - the computer does that for us! This is a plot of the colors detected in one 'lane' of a gel (one sample), scanned from smallest fragments to largest. The computer even interprets the colors by printing the nucleotide sequence across the top of the plot. This is just a fragment of the entire file, which would span around 700 or so nucleotides of accurate sequence.
  • Slide 20
  • Automated DNA Sequence Readouts
  • Slide 21
  • Slide 22
  • The Biology of DNA Sequencing Virtually all DNA sequencing, (both automated and manual) relies on the Sanger method DNA replication with dideoxy chain termination separation of the resulting molecules by polyacrylamide gel electrophoresis. The DNA fragment to be sequenced must first be cloned into a vector (plasmid or lambda). Then the cloned DNA must be copied in a test tube (in vitro ) by a DNA polymerase enzyme to obtain a sufficient quantity to be sequenced.
  • Slide 23
  • Slide 24
  • Sample DNA Sequence from ABI sequencer
  • Slide 25
  • Automated sequencing machines, particularly those made by PE Applied Biosystems, use 4 colors of dye, so they can read all 4 bases at once.
  • Slide 26
  • Challenges of DNA Sequencing One technician with an automated DNA sequencer can produce over 20 KB of raw sequence data per day. The real challenge of DNA sequencing is in the analysis of the data
  • Slide 27
  • J. Craig Venter Proposed a whole-genome shotgun sequencing method to NIH in 1991. Proposal rejected. Sets up The Institute for Genomic Research (TIGR) in 1992 (private and non-profit) TIGR publishes the first complete genome sequence in 1995 (Haemophilis influenzae) Forms Celera Genomics in 1998 to sequence human genome in three years (private, for- profit) The Sequence of the Human Genome is published in Science. February 2001 Venter departs Celera. 2002
  • Slide 28
  • Human Genome Project Sequencing Strategy Clone-based physical mapping Digest genome and make Bacterial Artificial Chromosomes (BACs, 150,000 bp each) Digest BACs to create fingerprints Organize BACs to form contigs Select BAC clones for sequencing Shear BACs and shotgun clone Sequence clones and assemble overlaps
  • Slide 29
  • Slide 30
  • Slide 31
  • Celera Sequencing Strategy Whole-genome shotgun sequencing of five individuals with 5 fold coverage Computer assembles overlapping sequences to form contigs Contigs are assembled into scaffolds Scaffolds are mapped to the genome by two or more Sequence Tagged Site (STS) markers
  • Slide 32
  • Slide 33
  • Technology Breakthroughs Development of Expressed Sequence Tag (EST) method to discover and map human genes Development of Bacterial Artificial Chromosomes (BACs) to clone large DNA fragments Development of an automated high- throughput capillary DNA sequencer in 1998 (Applied Biosystems ABI PRISM 3700 DNA Analyzer) Development of powerful computers and software to analyze sequence data
  • Slide 34
  • Genome Questions Has every base in our genome been sequenced? What is the total number of genes and where are they located? How many genes have an unknown function? What percent of our DNA encodes genes and what is the remainder? Do we share DNA sequences with other organisms? How much sequence variation is there between individuals?
  • Slide 35
  • Genome Sequencing HTG, GSS,(WGS) Draft Sequence ( HTG division ) shredding Whole BAC insert (or genome) cloning isolating assembly sequencing GSS division or trace archive
  • Slide 36
  • GSS Division: Genome Survey Sequences Genomic equivalent of ESTs BAC and other first pass surveys BAC end sequences Whole Genome Shotgun (some) RAPIDS and other anonymous loci Genomic Clone (BAC) T7 end SP6 end
  • Slide 37
  • Working Draft Sequence gaps
  • Slide 38
  • Technology Limitations Sequences can only be determined in approximately 400-800 base pair sections known as reads. This is due to both the biochemistry of the DNA polymerase enzyme and the resolution of polyacrylamide gel electrophoresis. Most genes contain many thousands of bp and many modern sequencing projects are intended to produce complete sequences of large genomic regions (millions of bp)
  • Slide 39
  • Assembly of Contigs As a result, all sequencing projects must involve the division of the target DNA into a set of overlapping ~500 bp fragments. Then these fragments are assembled into complete sequences (contigs) Contig = contiguous sequenced region Assembly of overlapping fragments is a computational problem
  • Slide 40
  • Slide 41
  • Contig Assembly Problems 1) The 500 bp reads of

View more