ms. shivani bhagwat lecturer, school of biotechnology davv next generation sequencing and gene...

35
Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Upload: felicity-anderson

Post on 01-Jan-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Ms. Shivani BhagwatLecturer,School of BiotechnologyDAVV

Next Generation sequencing and Gene Annotation

Page 2: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

DNA SEQUENCING

DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA.The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography.

Maxam–Gilbert sequencingDNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases.

Page 3: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

The method requires radioactive labeling at one 5' end of the DNA (typically by a kinase reaction using gamma-32P ATP) and purification of the DNA fragment to be sequenced. Chemical treatment generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). the purines (A+G) are depurinated using formic acid, the guanines (and to some extent the adenines) are methylated by dimethyl sulfate, and the pyrimidines (C+T) are methylated using hydrazine. The addition of salt (sodium chloride) to the hydrazine reaction inhibits the methylation of thymine for the C-only reaction. The modified DNAs are then cleaved by hot piperidine at the position of the modified base. Thus a series of labeled fragments is generated, from the radiolabeled end to the first "cut" site in each molecule. The fragments in the four reactions are electrophoresed side by side in denaturing acrylamide gels for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred.

NOTE: Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity prohibiting its use in standard molecular biology kits, extensive use of hazardous chemicals, and difficulties with scale-up.

Page 4: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Chain-termination methods

The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs) as DNA chain terminators.The classical chain-termination method requires a single-stranded DNA template, a DNA primer(labelled ), a DNA polymerase, normal deoxynucleotidetriphosphates (dNTPs), and modified nucleotides (dideoxyNTPs) that terminate DNA strand elongation.

The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase.

To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) which are the chain-terminating nucleotides, lacking a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, thus terminating DNA strand extension and resulting in DNA fragments of varying length.

Page 5: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

The newly synthesized and labelled DNA fragments are heat denatured.Separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C).The DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image.

NOTE:Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.

Page 6: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Dye-terminator sequencing

Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emit light at different wavelengths.

Page 7: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a day. DNA sequencers carry out capillary electrophoresis for size separation, detection and recording of dye fluorescence, and data output as fluorescent peak trace chromatograms.

Page 8: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Base calling software typically gives an estimate of quality to aid in quality trimming.

Page 9: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation
Page 10: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation
Page 11: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation
Page 12: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Massively parallel signature sequencing(MPSS)

Was in 1990s and a bit complicated.

It is a sequence based approach that can be used to identify and quantify mRNA transcripts present in a sample similar to serial analysis of gene expression (SAGE) but the biochemical manipulation and sequencing approach differ substantially.

mRNA transcripts to be identified through the generation of a 17-20 bp (base pair) signature sequence adjacent to the 3’-end.

Each signature sequence is cloned onto one of a million microbeads. The technique ensures that only one type of DNA sequence is on a microbead.

The microbeads are then arrayed in a flow cell for sequencing and quantification.fluorescently labeled encoders would be used to decode the sequence.

Page 13: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Pyrosequencing Technology

Developed by 454 Life Sciences, which has since been acquired by Roche Diagnostics.Based on emulsion PCR technology and detection of pyrophosphate release on nucleotide incorporation.

ssDNA template is hybridized to a sequencing primer and incubated with the enzymes DNA polymerase, ATP sulfurylase, luciferase and apyrase, and with the substrates adenosine 5´ phosphosulfate (APS) and luciferin.

The addition of one of the four deoxynucleotide triphosphates (dNTPs) initiates the second step. DNA polymerase incorporates the correct, complementary dNTPs onto the template. This incorporation releases pyrophosphate (PPi).

ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5´ phosphosulfate. This ATP acts as fuel to the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP.

Unincorporated nucleotides and ATP are degraded by the apyrase, and the reaction can restart with another nucleotide.

Page 14: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Emulsion PCR (ePCR)

PCR amplification

Page 15: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Sequential nucleotide addition

Page 16: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Light reaction

Page 17: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Sequencing by Synthesis technology(SBS)

Developed by Solexa and sequencing technology based on reversible dye-terminators and bridge PCR.

The combination of short inserts and longer reads increase the ability to fully characterize any genome.

DNA molecules are first attached to primers on a slide and amplified so that local clonal colonies are formed (bridge amplification). Four types of reversible terminator bases (RT-bases) are added, and non-incorporated nucleotides are washed away. Unlike pyrosequencing, the DNA can only be extended one nucleotide at a time. A camera takes images of the fluorescently labelled nucleotides, then the dye along with the terminal 3' blocker is chemically removed from the DNA, allowing the next cycle.

Reversible dye terminators: 3’-end has a protection group that can be reverted to a hydroxyl group once it has been incorporated in the growing DNA chain.

Page 18: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation
Page 19: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Sequencing by ligation technology

Developed by Applied Biosystems SOLiD .

Sequencing by ligation relies upon the sensitivity of DNA ligase for base-pairing mismatches.

The target molecule to be sequenced is a single strand of unknown DNA sequence, flanked on at least one end by a known sequence. A short "anchor" strand is brought in to bind the known sequence.

A mixed pool of probe oligonucleotides is then brought in (8 or 9 bases long), labeled (typically with fluorescent dyes) according to the position that will be sequenced. These molecules hybridize to the target DNA sequence, next to the anchor sequence, and DNA ligase preferentially joins the molecule to the anchor when its bases match the unknown DNA sequence. Based on the fluorescence produced by the molecule, one can infer the identity of the nucleotide at this position in the unknown sequence.

Page 20: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation
Page 21: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

VisiGen Biotechnologies approach

VisiGen Biotechnologies introduced a specially engineered DNA polymerase for use in their sequencing. This polymerase acts as a sensor - having incorporated a donor fluorescent dye by its active centre. This donor dye acts by FRET (fluorescent resonant energy transfer), inducing fluorescence of differently labeled nucleotides.

This approach allows reads performed at the speed at which polymerase incorporates nucleotides into the sequence (several hundred per second).

The nucleotide fluorochrome is released after the incorporation into the DNA strand.

The expected read lengths in this approach should reach 1000 nucleotides, however this will have to be confirmed.

Page 22: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Nanopore sequencing technology

Developed by Helicose Biosciences.

This method is based on the readout of electrical signal occurring at nucleotides passing by alpha-hemolysin pores covalently bound with cyclodextrin.

The DNA passing through the nanopore changes its ion current. This change is dependent on the shape, size and length of the DNA sequence. Each type of the nucleotide blocks the ion flow through the pore for a different period of time.

The method has a potential of development as it does not require modified nucleotides, however single nucleotide resolution is not yet available.

Page 23: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Emulsion PCR

The single-stranded DNA fragments or templates are attached to the surface of beads using adaptors or linkers, and one bead is attached to a single DNA fragment from the DNA library. The DNA library is generated through random fragmentation of the genomic DNA. The surface of the beads contains oligonucleotide probes with sequences that are complementary to the adaptors binding the DNA fragments. After that, the beads will be compartmentalized into separate water-oil emulsion droplets. In the aqueous water-oil emulsion, each of the droplets capturing one bead will serve as a PCR microreactor for amplification steps to take place and produce clonally amplified copies of the DNA fragment.

Page 24: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Bridge amplification on solid surface

High-density forward and reverse primers are covalently attached to the slide in a flow cell. The ratio of the primers to the template on the support defines the surface density of the amplified clusters.The flowcell is exposed to reagents for polymerase-based extension, and priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo on the surface. Repeated denaturation and extension results in localized amplification of DNA fragments in millions of unique locations across the flow cell surface. Solid-phase amplification can produce 100–200 million spatially separated template clusters (Illumina/Solexa), providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction.

Page 25: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Single-molecule templates

Some of the clonally amplified methods protocols are cumbersome to implement and require a large amount of genomic DNA material (3–20 μg). The preparation of single-molecule templates is more straightforward and requires less starting material (<1 μg). More importantly, these methods do not require PCR, which creates mutations in clonally amplified templates that masquerade as sequence variants. AT-rich and GC-rich target sequences may also show amplification bias in product yield, which results in their under representation in genome alignments and assemblies.

Single molecule templates are usually immobilized on solid supports using one of at least 3 different approaches:

1. Spatially distributed individual primer molecules are covalently attached to the solid support. The template, which is prepared by randomly fragmenting the starting material into small sizes (for example,~200–250 bp) and adding common adaptors to the fragment ends, is then hybridized to the immobilized primer

Page 26: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

2. Spatially distributed single-molecule templates are covalently attached to the solid support by priming and extending single-stranded, single-molecule templates from immobilized primers. A common primer is then hybridized to the template. In either approach, DNA polymerase can bind to the immobilized primed template configuration to initiate the NGS reaction.

Both of the above approaches are used by Helicos BioSciences.

3. Spatially distributed single polymerase molecules are attached to the solid support, to which a primed template molecule is bound. Larger DNA molecules (up to 10,000 bp) can be used with this technique .

This approach is used by Pacific Biosciences.

Page 27: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation
Page 28: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

GENE ANNOTATION

Page 29: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

What is Annotation???

Extraction, definition, and interpretation of features on the genome sequence derived by integrating computational tools and biological knowledge.

DNA Analysis-- Find the genes– Heuristic signals– Inherent features– Intelligent methods

Characterize each gene– Compare with other genes– Find functional components– Predict features

Page 30: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Heuristic Signals

DNA contains various recognition sites for internal machinery like:• Promoter signals• Transcription start signals• Start Codon• Exon, Intron boundaries• Transcription termination signals

Inherent Features

DNA exhibits certain biases that can be exploited to locate coding regions• Uneven distribution of bases• Codon bias• CpG islands• Encoded amino acid sequence• Imperfect periodicity• Other global patterns

Page 31: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Intelligent Methods

Pattern recognition methods weigh inputs and predict gene location– Content-based methods– Site-based methods– Comparative methods• Neural Networks• Hidden Markov Models

neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes.

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network.

Page 32: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Looks at several structural features

– Splice donor/acceptor sites– Putative coding regions– Intronic regions– Linear discriminant analysis to split exon / non-exon classes– Dynamic programming to assemble best gene structure

Page 33: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Quadratic discriminant analysis

– Exon length– Exon-intron transitions– Splice sites– Branch sites– Exon, strand, frame scores– Detects internal exons

Strategies

• Select by correlation coefficient• Select by review paper• Select by recommendation• Use them all

Page 34: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Internet Resources

Banbury Cross http://igs-server.cnrs-mrs.fr/igs/banbury FGENEH http://genomic.sanger.ac.uk/gf/gf.shtml GeneID http://www1.imim.es/geneid.html GeneMachine http://genome.nhgri.nih.gov/genemachine GENSCAN http://genes.mit.edu/GENSCAN.html Genotator http://www.fruitfly.org/_nomi/genotator/ GRAIL http://compbio.ornl.gov/tools/index.shtml GRAIL-EXP http://compbio.ornl.gov/grailexp MZEF http://www.cshl.org/genefinder PROCRUSTES http://www-hto.usc.edu/software/procrustes RepeatMasker http://ftp.genome.washington.edu/RM/RepeatMasker.html HMMgene http://www.cbs.dtu.dk/services/HMMgene

http://www.wiley.com/legacy/products/subject/life/bioinformatics/chapterlinks.html

Page 35: Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV Next Generation sequencing and Gene Annotation

Characterize a Gene

Collect clues for potential function

• Comparison with other known genes, proteins• Predict secondary structure• Fold classification• Gene Expression• Gene Regulatory Networks• Phylogenetic comparisons• Metabolic pathways