genomics and biotechnology - oregon state...

39
Genomics and Biotechnology DNA structure Replication Gene expression Genome size Genome structure: Coding sequences Transposable elements Genome technologies: sequencing PCR DNA fingerprinting EST databases Microarrays Comparative and functional genomics Model Systems for Plant Genomics Regulation of gene expression by methylation and RNA interference (Time?)

Upload: others

Post on 27-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Genomics and BiotechnologyDNA structure

Replication

Gene expression

Genome size

Genome structure:Coding sequencesTransposable elements

Genome technologies:sequencingPCRDNA fingerprintingEST databasesMicroarraysComparative and functional genomics

Model Systems for Plant GenomicsRegulation of gene expression by methylation and

RNA interference (Time?)

Nucleotides

DNA structure

Linear and circular DNA molecules

DNA packaging in cellStructure of nucleosome

DNA packaging in cell Chromatin organization

DNA replication: semi-conservative model

Replicative complex

The structure of eukaryotic gene. mRNA synthesis.

What is genome?The genome of an organism is a complete DNA sequence of one set of chromosomes

Genome of plants:

Nuclear

Mitochondrial

Chloroplast

Nuclear genome size

Genome complexity in plants

• Arabidopsis thaliana * : 120 Mbp (120,000,000 bp)

• Poplar * : 550 Mbp• Rice * : 450 Mbp• Maize 2,500 Mbp, • 5,000 Mbp barley, • Hexaploid wheat:16,000 Mbp• Fritillaria assyriaca, (Lilliaceae ): >87,000 Mbp

• * sequenced genomes

Genome size/number of genes

Gene density in genomes

C-value paradox

The total amount of DNA in the haploid genome is called its C value.

Psilotum nudum ("whisk fern") is a far simpler plant than Arabidopsis(it has no true leaves, flowers, or fruit). Nevertherless, it has 3000 times as much DNAas Arabidopsis. 80% or more of Psilotum DNA is repetitive DNA containingno genetic information.

Some amphibians contain 30 times as much DNA as humans.

The lack of a consistent relationship between the C value and the complexity of an organism is called the C-value paradox.

Eukaryotic genome organization

Genome instability: transposable elementsTransposons

are segments of DNA that can move to different positions in the genome of a single cell.

Class II Transposons consist of DNA that moves directly from place to place. Key components required for DNA transposition: flanking inverted repeats and enzyme transposase encoded by transposon

DNA mutations caused by Class II transposon movement:- insertions

- deletions - translocations

Maize transposable elements Ac/Ds:

Class III Transposons• Class III Transposons or MITEs (Miniature Inverted-repeats Transposable

Elements). •• Structure of C. elegans and rice MITEs:•

• 5' GGCCAGTCACAATGG..~ 400 nt..CCATTGTGACTGGCC 3'3' CCGGTCAGTGTTACC..~ 400 nt..GGTAACACTGACCGG 5'

• - too small to encode any protein• - the mechanism of transposition is not known (possibly depends on

proteins of Class II transposons that recognize MITEs inverted repeats

• -100,000 MITEs represent 6% of the total rice genome.

Retrotransposons

• Class I Retrotransposons transcribe the DNA into RNA and then use reverse transcriptase to make a DNA copy of the RNA template to insert in a new location.

• At least 50% of the nuclear DNA of maize consists of retrotransposons

• Retrotransposons represent about 40% of the entire human genome

• Retrotransposons in corn and other plants appear to be retrovirus-like parasites, an unexpected finding because other organisms with such matter in their genomes, such as humans, are susceptible toretroviral diseases. Active retroviruses have never been seen in plants.

A 280 kb region containing the maize Adhl-F and u22 genes is composed primarily of retrotransposons inserted within each other. Ten retroelement families were discovered, with reiteration frequencies ranging from 10 copies to 30,000 copies per haploid genome.

Structure of the Adh1-F region of maize.Retrotransposons accounted for over 60% of the Adhl-F

region

Are transposons just genomic “parasites” or active participants in genome evolution?

• Transposons have been called "junk" DNA and "selfish" DNA. "selfish" because their only function seems to make more copies of themselves and "junk" because there is no obvious benefit to their host.

• Retrotransposons cannot be so selfish that they reduce the survival of their host. Perhaps, they even confer some benefit.

• Transposons can destroy or alter the gene's activity by disrupting it’s functional sequence. • Retrotransposons often carry some additional sequences at their 3' end as they insert into a new

location. Perhaps these occasionally create new combinations of exons, promoters, and enhancers that benefit the host. Example:

• - Thousands of our Alu elements occur in the introns of structural genes

• -Some of these contain sequences that when transcribed into the primary transcript are recognized by the spliceosome.

– These can then be spliced into the mature mRNA creating a – new exon, which will be transcribed into a new protein product. – Alternative splicing can provide not only the new mRNA (and thus protein) but also the old. – In this way, nature can try out new proteins without the risk of abandoning the tried-and-true old

one. – Finally, transposons can cause duplications as a result of unequal crossover during

recombination.

Transposons in genome studies and biotechnology

Class II transposons are frequenly used for:

- mutagenesis to obtain loss-of-function insertions into the gene of interest

- gene discovery systems based on Dstransposon (gene trap and enhancer trap)

Genome technologies:

SequencingPCRDNA fingerprintingGene expression profiling: microarraysEST databasesComparative and functional genomics

Dideoxysequencing

Sequenced genomesThe genomes of more than 150 organisms have been sequenced since

1995

• Arabidopsis thaliana• Oryza sativa (rice)• Populus trichocarpa (poplar)• Saccharomyces cerevisiae (yeast)• Caenorhabditis elegans• Anopheles gambiae (mosquito )• Apis mellifera (bee) • Mus musculus (mouse) • Rattus norvegicus (rat)• Pan troglodytes (chimpanzee)• Homo sapiens (human)

http://www.genomenewsnetwork.org/resources/sequenced_genomes/genome_guide_p1.shtml

Polymerase Chain Reaction

PCR-based DNA fingerprinting. SSR markers

PCR primer

5’-[ATTT]x-3’

PCR primer

Microsatellites (short tandem repeats) contain 2-5 bp repeats.

Microarray technology

DNA hybridization

Microarray chip technology

Detail: Detail: Detail:

Size: 12cm x 8cm Size: 5,4cm x 0,9cm Size: 1,28cm x 1,28cm

•2400 clones by membrane •radioactive labelling•1 experimental condition by membrane

•10000 clones by slide •fluorescent labelling•2 experimental conditions by slide

•300000 oligonucleotides by slide •fluorescent labelling•1 experimental condition by slide

Microarray image analysis

Microarray technique applications

• Gene expression profiling

• Differential expression of genes at the whole genome scale.

• Tissue-specific gene expression

Functional and comparative plant genomics

• Small genome size (1.2x108 bp)

• High gene density throughout most of the Arabidopsis genome (approximately one gene per 4 to 5 kb)

• Repetitive DNAs are relatively rare, comprising ~10% of the Arabidopsis genome.

• Short life cycle (~ 6 weeks)

• NSF: The Project 2010: to establish function of all 25,498 Arabidopsis genes.

• Populus is a model system for tree genomics. Poplar genome is sequenced. 95,000 ESTs from 20 different cDNA libraries from a range of tissues and developmental stages are available.

What makes Arabidopsisa model plant system for functional genomics?

What is functional genomics?

Understanding the function of genes and other parts of the genomeWhat is comparative genomics?

Comparative genomics is the analysis and comparison of genomes from different species.

Plants exhibit extensive conservation of both gene content and gene order

Loblolly pine and Arabidopsis thaliana differ greatly in form, ecological niche, evolutionary history, and genome size. Nevertherless, for contigs 1,100 bp or longer, 90% have an apparent Arabidopsis gene homolog.Kirst et. al. 2003, PNAS,100 (12),| 7383-7388.

Regulation of gene expression by methylation

• The vast majority of methylation is related to the sequence 5'-CpG-3'

• % 5'-mC: animals 2-7%; plants >25%

• CpG islands exist that are often associated with genes

• the methylation pattern is heritable from generation to generation.

• Low 5'-mC-----------> high levels of gene expression• High 5'-mC ------------> low levels of gene expression

Regulation of gene expression by RNA interference and micro RNA

• RNAi is a highly potent and specific process which is actively carried out by special mechanisms in the cell, known as the RNA interference machinery.

• the presence of small fragments of double-stranded RNA(dsRNA) whose sequence matches a target gene interferes with the expression of that gene

• miRNA - short RNA molecules that fold back on themselves in a hairpin shape to create a double strand

• RNAi has been applied as gene“knockout “ tool

SummaryDNA is built from deoxyribonucleotides, RNA contains ribonucleotides. In RNA deoxythymidine is substitutetd by uracil. The strands in DNA double helix are anti-parallel. DNA molecules could be linear or circular. Nucleosome is a structural unit of DNA packaging in chromosomes of eukaryotic cell.DNA replicates in a semi-conservative manner. Replicative complex machinery replicates DNA with high accuracy and processivity. Eukaryotic genes consist from exons (coding regions) interrupted by non-coding introns. During RNA splicing in nucleus introns are removed from pre-mRNA. The main regulatiry elements of eukaryotic gene: promoter, terminator, poly-adenilation signal,

translation initiation stop and codons. The genome of an organism is a complete DNA sequence of one set of chromosomes. Genome of plants: nuclear, mitochondrial, and chloroplast. Eukaryotic genome organization: unique coding DNA (unique coding genes), repetitive DNA (functional dispersed and tandemly

repeated gene families; transposons) and spacer DNA. Gene density is higher in prokaryotic organisms than in eukaryotic.The total amount of DNA in the haploid genome is called its C value.The lack of a consistent relationship between the C value and the complexity of an organism is called the C-value paradox.Transposons are segments of DNA that can move to different positions in the genome of a single cell. Class II transposons consist of DNA that moves directly from place to place. Key components required for DNA transposition:

flanking inverted repeats and enzyme transposase encoded by transposonClass III Transposons or MITEs are similar to Class II but too small to encode any protein, the mechanism of MITEs

transposition is not known. Class I Retrotransposons transcribe the DNA into RNA and then use reverse transcriptase to make a DNA copy of the RNA

template to insert in a new location.The large portions of genome consist of transposons and other repetitive DNA. Transposons can cause DNA mutations

including insertions, deletions and chromosomal translocations that could be beneficial or detrimental for the evolution. Transposon mutagenesis is a valuable tool in molecular genetics and biotechnology.

Summary (continuation)

• Automated high-throughput dideoxysequencing allows to obtain sequence of the entire genome. More than 150 genomes have been sequenced by date.

• Microsatellites or short tandem repeats containing 2-5 bp repeats are molecular markers used for genetic mapping and DNA fingerprinting using PCR.

• PCR (polymerase chain reaction) allows amplification of defined fragment from genomic DNA. PCR sensitivity is• so great that it allows to amplify DNA from of a single cell.

• Microarray technology allows to monitor differential expression of genes at the whole genome scale.

Functional genomics is understanding the function of genes. Comparative genomics is the analysis and comparison of genomes from different species.

• Plants exhibit extensive conservation of both gene content and gene order among species.

• Gene expression regulation in plants. • Methylation: High content of methylated cytosin correlates with low levels of gene expression. The methylation

pattern is heritable from generation to generation.• RNA inerference is triggered by the presence of small fragments of double-stranded RNA (dsRNA) whose sequence

matches a target gene. RNAi is is a highly potent and specific process developed to control viral replication, retrotransposon movement and to recognize and to destroy aberrant dsRNAs. RNAi that is applied in biotechnology as gene“knockout “ tool.

Questions

• What are the differences in the structure of DNA and RNA?

• What is C-value paradox?

• Are transposons just genomic “parasites” or active participants in genome evolution?

• What are the subjects of studies in functional and comparative genomics?

• Are gene content and gene order conserved among plants?

• Is it possible to amplify DNA from a single cell using polymerase chain reaction?

• What is the major application of microarray technique in functional genomics?

• How DNA methylation affects gene expression?

• Can RNAi be used as gene“knockout “ tool?