1. genomic dna libraries for shotgun sequencing projects

Upload: govind-kumar-rai

Post on 10-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    1/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Genomic DNA Librariesfor Shotgun SequencingProjects

    William C. Nierman

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    2/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Whole Genome Shotgun SequencingWhole Genome Shotgun Sequencing

    Random Sequencing Phase

    a. sequence DNA(15,000 sequences/ Mb)

    GGG ACTGTTC ...

    a. isolate DNA

    b. fragment DNA

    c. clone DNA

    Closure Phase

    a. assemble sequences

    b. close gaps

    d. annotation

    c. edit

    237 239

    238COMPLETEGENOME SEQUENCE

    Library construction

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    3/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Genomic Sequencing Overview

    Large Insert Library (20 - 500 Kb)

    PhysicalMap

    Genomic DNAMarker1 Marker2

    Shotgun Library (2-3 Kb)Sequencing

    (6-8 X)

    Assembly

    Gap Closure

    Analysis

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    4/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Genomic Sequencing Overview

    Genomic DNAMarker1 Marker2

    Shotgun Library (2,10, 50 Kb)

    Sequencing(6-8 X)

    Assembly

    Gap Closure

    Analysis

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    5/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Library Construction

    Clone Picking

    Template Preparation

    SampleTracking

    Sequencing Reactions

    Electrophoresis andBase Calling

    Sequence Files

    Genome Assembly

    Shotgun SequencingPhase

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    6/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    graphical representation of phred quality values

    Consensusquality values

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    7/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    3 Tier Whole Genome Shotgun3 Tier Whole Genome ShotgunLibrary StrategyLibrary Strategy

    1. Moderate copy number plasmidsplasmids containing ~2-kb inserts

    2. Moderate copy number plasmids containing~10-kb inserts

    3. Fosmid or other clones containing 40 - 200-kbinserts

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    8/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    TIGR Assembly Viewer. Green arrows represent F and R sequences from the same clone.Red arrows represent sequences with a sequence mate in a different contig. 5 end of theassembly points to a telomeric repeat and is linked to a clone containing telomeric sequence

    Repetitivesequences

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    9/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Repetitive Regions

    Output from the TIGR software tool repeat Display showing a section of an assembly. Theblack boxes represent a 700 bp repeat (7V, 24 copies/genome) and a 3100 bp repeat (9D, 9copies/genome). Both repeats are spanned by clone DMGRG22. To confirm the sequenceof these repeats, this clone was transposed.

    Large-insertspanning clone,DMGRG22

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    10/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    11/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH 4738A4737A

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    12/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Library Requirements1. Free vector should be at low or undetectable level.

    2. None of the clones should contain chimeras derived byinsertion of two or more random fragments from separateparts of the genome.

    3. The inserts should be of relatively uniform size.

    4. Libraries of different insert sizes for linking should be used.

    5. Libraries should be representative of genome.

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    13/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Vector

    Vectorplus insert

    DNA insertBstXI adaptor

    CTTTCCAGCACA

    GTGTGACCTTTC

    GAAAGGTC

    CTGGAAAG

    Complementary to BstXI adaptor

    Ligate

    Ligat e

    BstXI adaptor cloning system

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    14/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Library Requirements1. Free vector should be at low or undetectable level.

    2. None of the clones should contain chimeras derived byinsertion of two or more random fragments from separate

    parts of the genome.

    3. The inserts should be of relatively uniform size.

    4. Libraries of different insert sizes for linking should be used.

    5. Libraries should be representative of genome.

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    15/40

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    16/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    17/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    18/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    19/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    What is Unclonable DNA ?Difficult cloning targets include severaldifferent types of sequences, such as: Toxic coding sequences Promoters A/T Rich DNA Modified bases

    Repetitive regions

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    20/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Library Coverage and Randomness

    Tolerance of cloned DNA by E.

    coli host

    Vector copy number

    Insert size

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    21/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Vector Design Issues

    Vector driven transcription and translation into theinsert induce expression of the cloned sequence. Fortuitous transcription out of the insert can interfere

    with vector maintenance.

    False positives and false negatives arise frominappropriate transcription. High copy number can cause plasmid instability.

    lacP

    Cloned fragment

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    22/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Sequencing Project Vector Features1. The sequencing primer sites immediately flank the cloningsite to avoid excessive re-sequencing of vector DNA.

    2. PCR primer sites are located immediately outside of thesequencing primer sites to allow PCR amplification fortemplate preparation.

    3. The entire cloning region including the primer sites is isolatedfrom RNA transcription.

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    23/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Design Features of a BstXI AdaptorCloning System

    pHOS vector plus insert

    AmpR

    Ori, copynumber

    ter1

    ter2

    Pr

    BstXI site BstXI site Reverse sequencing primer

    Reverse PCR primer

    Forward sequencing primer

    Forward PCR primer

    rrnBT1 rrnBT2

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    24/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Construction of Linking Library in pHOS2Kan

    genomic DNA ~50 kb with Bst XI adaptors

    CTTTCCAGCACA

    GAAAGGTC

    CTGGAAAG

    ACACGACCTTTC

    pHOS2

    RestrictionDigest

    Amp

    Amp

    PhosphataseLigate Kan Cassette

    Kan

    Amp

    Double Amp/KanSelection

    pHOS2

    pHOS2

    Ligation

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    25/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    26/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Fosmid Library Construction

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    27/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Copy Number Induced (+) vs. Uninduced (-) Fosmid DNA preps

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    28/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    29/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    30/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Library Mix

    Wolbachia (endosymbiont of B. malayi)Sequenced to 20X

    True genome size: 1,080,471 bases.

    At 7.6X coverage: 0% small, 100% large gave 1 scaffold

    of 1,076,660 bp and 10 contigs 5% small, 95% large gave 1 scaffold of

    1,077,210 bp and 12 contigs 60% small, 40% large gave 14 scaffolds

    (largest=160 kb), 79 contigs

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    31/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Redundancy Analysis from Completed Projects

    16201284.2X3.2XTPG Large321

    916.5X6.5X

    TPG Small

    4221045.1X2.5XGFS Large

    21012310.4X10.8XGFS Small

    1238384.1X3.2XGMX Large

    13219963.4X2.2XGMX Small

    279944.7X2.1XGBS Large

    95758.9X9.0XGBS Small

    ContigsScaffoldsCoverageActual

    CoverageEst.

    Genome &Insert Size

    gbs = Streptococcus agalactiaegmx = Myxococcus xanthusgfs = Fibrobacter succinogenestpg = Theileria parva

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    32/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    5 RNA'S8%

    REPEATS15%FAILED MATES

    1%

    EDITING8%

    COVERAGE3%

    MATT'S HELP3%

    SEQ GAPS12%

    PHYS GAPS23%

    2 DIFFICULT REPEATS27%

    Comparison of Library Strategies

    Genome BSP GBS GSA GSE S. pneumoniae S. agalactiae S.aureus S. epidermidis

    Size MB 2.1 2.1 2.8 2.7 Groups 160 58 134 12Seq Gaps 290 46 198 24Start Date Nov 95 Dec 00 Mar 99 Feb 01In Closure 49 months 10 months 26 mon ths 7 months

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    33/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Myxococcus xanthus Sequencing Statistics Total shotgun sequences _ 130,436

    TIGR Library insert sizes 2-3 kb, 10-12 kb Sequence coverage of 9X

    Assembled into single scaffold of 103 contigs Two rounds of autoprimer sequencing

    reduced contig number to 36 9,131,959 bases, 3500 Ns in gaps

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    34/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Aspergillus fumigatus karyotype

    1,789 Kb

    3,779 Kb

    2,021 Kb

    3,992 Kb

    4,018 Kb

    4,834 Kb

    4,891 Kb

    3,933* Kb

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    35/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Optical Analysis

    Molecule maps generated from images of single DNA molecule digested with NheI

    Resolution (avg fragment size) 8.28kb Total coverage: 8,987 Mbase, or 300x Total of 8 chromosomes

    Total size: 29.189 Megabases

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    36/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    A. fumigatus chr5-7 contig placement

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    37/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

    Aspergillus fumigatusChromosomes

    Presumed centromeric area

    Telomere

    3

    2

    6

    5

    8

    7

    1

    4

    Mitochondrion

    rRNA

    32 Kb

    4.9 Mb

    4.8 Mb

    4.0 Mb

    3.9 Mb

    3.9 Mb

    3.6 Mb

    2.0 Mb

    1.8 Mb

    2.2 2.7

    1.8 3.0

    1.3 2.8

    2.50.4 0.70.3

    1.2 2.6

    1.3 2.5

    0.7 1.3

    0.8 1.0

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    38/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    39/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH

  • 8/8/2019 1. Genomic DNA Libraries for Shotgun Sequencing Projects

    40/40

    TIGRTIGRTIGRTHE INSTITUTE FOR GENOMIC RESEARCHTHE INSTITUTE FOR GENOMIC RESEARCH