whole genome sequencing of arabidopsis thaliana
TRANSCRIPT
WHOLE GENOME SEQUENCING OF Arabidopsis thaliana
ByBHAVYASREE R K
“GENOME SEQUENCING”
• Idea discussed in scientific community during 1984
onwards•1990 : human genome project officially began•Genome sequencing approaches:• Clone by clone sequencing• Shot gun sequencing
“GENOMICS”Determination of genetic information and the mechanism by which this information used by the organism
“GENOME”The complete set of genetic information
of an organism
Genome sequencing projectsModel organisms: Mostly used in genetic and scientific studies
Yeast E.coli Cenorhabditis elegans
Drosophila
Arabidopsis thaliana
Genome size:Nuclear: 125 MbPlastid: 154 Kb
Mitochondria:367 Kb
Small plant
belongs to family
Cruciferae
Larger no. of offsprings & short generation time
Convenience and abundance
Low amount of
repetitive DNA
Basic similarities to other crops
Susceptible to T-
DNA insertions
Relatively smaller and simpler
genome
Arabidopsis: A model plant
ARABIDOPSIS GENOME ANALYSIS: Initiation and progress
• 1983 - first genetic map published• 1988-89 - publication of RFLP maps• 1990 - Multinational Coordinated Arabidopsis thaliana
Genome project initiated• 1991 - first YAC libraries• 1995-96 - standard BAC and P1 libraries constructed • 1996 - Arabidopsis Genome Initiative organised and started
sequencing• 1998 - Physical maps of all chromosomes completed• 1999 - sequence and analysis of chromosome 2 and 4• 2000 - sequence and analysis of chromosomes 1, 3 and 5• 2000 - completion of whole genome sequencing
This report includes:– Completed Arabidopsis genome sequences– Annotation of predicted genes– Assignment of functional categories– Chromosomal dynamics and architecture– Distribution of transposable elements and other
repeats– Extend of lateral gene transfer from organelles– Comparison of the genome sequence and structure to
that of other Arabidopsis accessions and plant species
Sequencing strategy
• “Clone by clone sequencing”= “hierarchical shot gun sequencing”= “map based shot gun sequencing “
• It includes:– Map construction– Clone selection– Sub clone library construction– Random shot gun phase– Directed finishing phase– Sequence authentication
• Primary substrates – large insert BAC , P1 and TAC libraries
• Physical maps of genome of accession COLUMBIA were assembled by restriction fragment ‘fingerprint’ analysis of BAC clones, by hybridization or PCR of STS and by hybridization and southern blotting
• 47788 BAC clones are end sequenced to assemble the contigs
Steps
• Minimally overlapping 1569 BAC, TAC,Cosmid and P1 clones (avg. Insert size: 100 Kb) used to assemble 10 contigs :represent minimum tiling path
• These clones are selected for shot gun sequencing
• To link the regions not covered by cloned DNA or to optimize the minimum tiling path 22 PCR products were amplified directly from genomic DNA
• DNA insert of selected clones is purified and subjected to random fragmentation by physical shearing
• Enzymatic repair is done in broken end• Size fractionation and elution of 2-5 Kb
fragments• They are subcloned into plasmids or M13
vectors
• Sequence reads of plasmids are derived from universal priming sites
• Sufficient redundant data generated and sequence reads are computationally assembled (>99.99% accuracy if 8-10 fold sequence coverage)
• All available sequenced genetic markers were integrated to sequence assemblies to verify the sequenced contigs
Outcomes of sequencing project
• 115409949 bp (~115.4 Mb) are sequenced• The unsequenced centromeric and ribosomal
DNA repeat regions measures roughly 10 Mb• 25498 genes are predicted
Outcomes of sequencing project
• Characterization of the coding regions• Genome organization and duplication• Comparative analysis of Arabidopsis
accessions• Comparison of Arabidopsis and other plant
genera• Integration of 3 genomes in the plant cell• Transposable elements• rDNA, telomeres and centromeres
• Membrane transport• DNA repair and recombination• Gene regulation• Cellular organization• Development• Signal transduction• Recognizing and respond to pathogens• Photomorphogenesis and photosynthesis• metabolism
1001 Genomes Project• Launched at the beginning of 2008 to discover the whole-genome
sequence variation in 1001 strains • Each accession is an inbred line with seeds that are freely available
from the stock centre• Hierarchical approach of selection 1001 genomes
– Sampling 10 individuals from 10 populations each in 10 geographical regions throughout Eurasia plus at least one north African accession (10x10x10+1)
• sequence information can be used directly in association studies at biochemical, metabolic, physiological, morphological and whole plant-fitness levels
• The complete genome sequences of over 80 accessions were released in early 2010 by the Max Planck Institute,
• many more have been added by the Salk Institute, the Gregor Mendel Institute and Monsanto.
• September 2014 over 1100 lines have been sequenced,
References
• The Multinational Coordinated Arabidopsis thaliana Functional Genomics Project: Annual Report 2010 Multinational Arabidopsis steering committee, 2010
• Weigel D & Mott R(2009) The 1001 genomes project for Arabidopsis thaliana Genome Biol 10(5):107.
• http://1001genomes.org/
• Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana nature 408: 796-815.
• Green E D (2001) Strategies for the systematic sequencing of complex genomes Nature Reviews Genetics 2(8):573-83.
• Singh B D (2009) Biotechnology expanding horizons Kalyani India.
THANK YOU