genomics a. overview: b. sequencing: c.finding genes – structural genomics and ‘annotation’: -...

51
Genomics A. Overview: B. Sequencing: C.Finding Genes – structural genomics and ‘annotation’: - once you have the sequence data, you really have just started. - The goals are then: - identify where genes are (Open Reading Frames) - find promoters and regulatory elements to confirm this is a gene (and not a pseudogene). - in eukaryotes, find splice sites, introns and exons - identify structural sequences like telomeres and centromeres - convert the DNA sequence into the predicted AA sequence of the protein - predict protein structure and function by identifying ‘domains’ and ‘motifs’ - These goals are attained by computer analyses of gene/AA sequence data, and comparison with known described genes. This is: BIOINFORMATICS

Upload: barry-greer

Post on 20-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

GenomicsA. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:

- once you have the sequence data, you really have just started. - The goals are then:

- identify where genes are (Open Reading Frames) - find promoters and regulatory elements to confirm this is a gene

(and not a pseudogene). - in eukaryotes, find splice sites, introns and exons - identify structural sequences like telomeres and centromeres

- convert the DNA sequence into the predicted AA sequence of the protein - predict protein structure and function by identifying ‘domains’ and ‘motifs’

- These goals are attained by computer analyses of gene/AA sequence data, and comparison with known described genes. This is:

BIOINFORMATICS

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:

1. NCBI – BLAST search compares sequence to other sequences in the database

1 ggggcacccc tacccactgg ttagcccacg ccatcctgag gacccagctg cacccctacc61 acagcacctc gggcctaggc tgggcggggg gctggggagg cagagctgcg aagaggggag121 atgtggggtg gactcccttc cctcctcctc cccctctcca ttccaactcc caaattgggg181 gccgggccag gcagctctga ttggctgggg cacgggcggc cggctccccc tctccgaggg241 gcagggttcc tccctgctct ccatcaggac agtataaaag gggcccgggc cagtcgtcgg301 agcagacggg agtttctcct cggggtcgga gcaggaggca cgcggagtgt gaggccacgc361 atgagcggac gctaaccccc tccccagcca caaagagtct acatgtctag ggtctagaca421 tgttcagctt tgtggacctc cggctcctgc tcctcttagc ggccaccgcc ctcctgacgc481 acggccaaga ggaaggccaa gtcgagggcc aagacgaaga cagtaagtcc caaacttttg541 ggagtgcaag gatactctat atcgcgcctt gcgcttggtc ccgggggccg cggcttaaaa601 cgagacgtgg atgatccgga gactcgggaa tggaagggag atgatgaggg ctcttcctcg661 gcgccctgag acaggaggga gctcaccctg gggcgaggtt ggggttgaac gcgccccggg721 agcgggaggt gagggtggag cgccccgtga gttggtgcaa gagagaatcc cgagagcgca781 accggggaag tggggatcag ggtgcagagt gaggaaagta cgtcgaagat gggatggggg841 cgccgagcgg ggcatttgaa gcccaagatg tagaagcaat caggaaggcc gtgggatgat901 tcataaggaa agattgccct ctctgcgggc tagagtgttg ctgggccgtg ggggtgctgg961 gcagccgcgg gaagggggtg cggagcgtgg gcgggtggag gatgagaaac tttggcgcgg1021 actcggcggg gcggggtcct tgcgccccct gctgaccgat gctgagcact gcgtctcccg

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:

1. NCBI – BLAST search compares sequence to other sequences in the database

2. Open Reading Frames: base sequences which would code for long stretches of AA’s before a stop codon would be reached. Typically, these are found by looking for [5’ – ATG…-3’] sequences that follow a promoter (TATA, CAAT, GGGCGG). The complement would be [3’ – TAC..-5’], which would encode a start codon in RNA [5’- AUG…3’]

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:

1. NCBI – BLAST search compares sequence to other sequences in the database

2. Open Reading Frames: base sequences which would code for long stretches of AA’s before a stop codon would be reached. Typically, these are found by looking for [5’ – ATG…-3’] sequences that follow a promoter (TATA, CAAT, GGGCGG). The complement would be [3’ – TAC..-5’], which would encode a start codon in RNA [5’- AUG…3’]

3. Regulatory regions and splicing sites (GT-AG):

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D. Identifying Gene Function – functional genomics:

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:

- Sequence Homology: search libraries for similar sequences already described in other proteins with known function, in other species…..

Arabidopsis thalia

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:

- Sequence Homology: search libraries for similar sequences already described in other proteins with known function, in other species… or the same species

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:

- Sequence Homology: search libraries for similar sequences already described in other proteins with known function, even in other species.

- Domain / Motif Analysis: Certain AA sequences are known to have a certain structure (‘motif’ like “helix-turn-helix”) or function (‘domain’ like an ion channel sequence, DNA binding region).

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:

- Sequence Homology: search libraries for similar sequences already described in other proteins with known function, even in other species.

- Domain / Motif Analysis: Certain AA sequences are known to have a certain structure (‘motif’ like “helix-turn-helix”) or function (‘domain’ like an ion channel sequence, DNA binding region).

- Mutant Analysis: Mutate the gene (insert a non-functional sequence) in vitro, then insert in cells and observe effects of “knocking out” function in different tissues or the whole organism.

Capecchi, Evans, and Smithies were awarded the 2007 Nobel Prize for their technique for inserting a gene into embryonic cells…this gene can be a mutant, non-functional gene (“knock-out”) or a functional gene (“knock-in”).

Typically, you would then screen mice for those who, by luck, had transformed cells end up in their gonads. These mice will pass the mutation to their gametes; so if you mate a male and female, you will create offspring that are homozygous for this mutation across their entire genome….and you can see it’s effects.

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:E.Comparing Protein Expression - Construction of a microarray – ‘gene chip’

Can create a chip with unique sequence DNA from every gene in a genome (‘probe’).

Take a tissue sample

Isolate m-RNA

Make labeled c-DNA

Expose to chip and allow complementation

Wash

Analyze florescence at each point; binding denotes that this tissue has this gene on at this point in development

Take a tissue sample

Isolate m-RNA

Make labeled c-DNA

Expose to chip and allow complementation

Wash

Analyze florescence at each point; binding denotes that this tissue has this gene on at this point in development

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:E.Comparing Protein ExpressionF.Phylogenetic Analyses: Comparative Genomics

- DNA or AA sequences can be compared across species

For example… download the sequence for cytochrome-c from different organisms:

>ArabidopsisMASFDEAPPGNPKAGEKIFRTKCAQCHTVEKGAGHKQGPNLNGLFGRQSGTTPGYSYSAANKSMAVNWEEKTLYDYLLNPKKYIPGTKMVFPGLKKPQDRADLIAYLKEGTA>EuglenaGDAERGKKLFESRAGQCHSSQKGVNSTGPALYGVYGRTSGTVPGYAYSNANKNAAIVWEDESLNKFLENPKKYVPGTKMAFAGIKAKKDRLDIIAYMKTLKD>HippoGDVEKGKKIFVQKCAQCHTVEKGGKHKTGPNLHGLFGRKTGQSPGFSYTDANKNKGITWGEETLMEYLENPKKYIPGTKMIFAGIKKKGERADLIAYLKQATNE>MosquitoMGVPAGDVEKGKKLFVQRCAQCHTVEAGGKHKVGPNLHGLFGRKTGQAAGFSYTDANKAKGITWNEDTLFEYLENPKKYIPGTKMVFAGLKKPQERGDLIAYLKSATK>RiceMASFSEAPPGNPKAGEKIFKTKCAQCHTVDKGAGHKQGPNLNGLFGRQSGTTPGYSYSTANKNMAVIWEENTLYDYLLNPKKYIPGTKMVFPGLKKPQERADLISYLKEATS

Use clustalX to align sequences and resolve a phylogeny

Use n-j plot to see the plot

Arabidopsis

Rice

Hippo

Mosquito

Euglena0.02

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:E.Comparing Protein ExpressionF.Phylogenetic Analyses: Comparative GenomicsG.Conclusions from Genomic Studies:

- there is remarkable homology in protein/gene sequence between species

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:E.Comparing Protein ExpressionF.Phylogenetic Analyses: Comparative GenomicsG.Conclusions from Genomic Studies:

- there is remarkable homology in protein/gene sequence between species - physiological/developmental complexity is not correlated with genome size

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:E.Comparing Protein ExpressionF.Phylogenetic Analyses: Comparative GenomicsG.Conclusions from Genomic Studies:

- there is remarkable homology in protein/gene sequence between species - physiological/developmental complexity is not correlated with genome size - only 2-5% of human genome codes for proteins

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:E.Comparing Protein ExpressionF.Phylogenetic Analyses: Comparative GenomicsG.Conclusions from Genomic Studies:

- there is remarkable homology in protein/gene sequence between species - physiological/developmental complexity is not correlated with genome size - only 2-5% of human genome codes for proteins - although there are 100,000 proteins, there are only 20,000 genes…

suggesting that most genes encode multiple proteins, produced through transcript and post-translational processing.

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:E.Comparing Protein ExpressionF.Phylogenetic Analyses: Comparative GenomicsG.Conclusions from Genomic Studies:

- there is remarkable homology in protein/gene sequence between species - physiological/developmental complexity is not correlated with genome size - only 2-5% of human genome codes for proteins - although there are 100,000 proteins, there are only 20,000 genes…

suggesting that most genes encode multiple proteins, produced through transcript and post-translational processing.

- Most of the genome does NOT encode protein. However, large fractions of DNA do encode nc-RNA’s… “non-coding RNA’s” which are not translated but are produced by transcription and then exert a regulatory function (mi-RNA’s and others).

Genomics

A. Overview:B. Sequencing:C.Finding Genes – structural genomics and ‘annotation’:D.Identifying Gene Function – functional genomics:E.Comparing Protein ExpressionF.Phylogenetic Analyses: Comparative GenomicsG.Conclusions from Genomic Studies:

- there is remarkable homology in protein/gene sequence between species - physiological/developmental complexity is not correlated with genome size - only 2-5% of human genome codes for proteins - although there are 100,000 proteins, there are only 20,000 genes…

suggesting that most genes encode multiple proteins, produced through transcript and post-translational processing.

- Most of the genome does NOT encode protein. However, large fractions of DNA do encode nc-RNA’s… “non-coding RNA’s” which are not translated but are produced by transcription and then exert a regulatory function (mi-RNA’s and others).

- So, organisms with similarities in coding genes can be remarkably different…as a consequence of how the production of those proteins is regulated in different cell types and at different developmental periods.

PHEW!!!!

Recombinant DNA Technology combines DNA from different sources – usually different species

Utility:this is done to study DNA sequences to mass-produce proteinsto give recipient species new characteristicsas a therapy/curative for genetic disorders (‘gene therapy’)

Corn damaged by corn borer and fungi

“bt-corn”, with a bacterial geneHuman insulin, created in bacteria

GenomicsGenetic Engineering

A. To mass-produce proteins

GenomicsGenetic Engineering

A.To mass-produce proteins

Making human insulin

GenomicsGenetic Engineering

A.To mass-produce proteins

Eukaryote genes may not be read properly by bacterial hosts because of introns and regulatory elements. In addition, the protein may not be processed correctly or fold correctly. Using a eukaryotic host solves these problems… but tissue expression is the problem.

A1-antitrypsin was the first; antithrombin is the first transgenic protein produced in animals to be approved by FDA for human use.

GenomicsGenetic Engineering

A.To mass-produce proteins

Vaccines (HPV vaccine – ‘Gardasil’ ) are being synthesized that consist of only a few proteins that initiate the immune response, rather then the entire virus (or bacterium). The genes for these proteins could be put in food, to intiate an immune response.

GenomicsGenetic Engineering

A.To mass-produce proteinsB.To give species new characteristics

The EPSP synthase gene in E. coli confers resistance to glyphosate – the primary ingredient in herbicides like Round-Up©.

GenomicsGenetic Engineering

A.To mass-produce proteinsB.To give species new characteristics

Agrobacterium is a plant pathogen that inserts Ti plasmids into host cells. These plasmids have been used as vectors for introducing the gene into plant tissues, which grow into new plants.

GenomicsGenetic Engineering

A.To mass-produce proteinsB.To give species new characteristics

Bacillus thuringiensis is a bacterium that produces a protein that crystallizes in insect guts, killing the insect.

Since the 1930’s, the bacteria were sprayed on crops to reduce insect damage. The treatment was very short term, as the bacteria died quickly.

GenomicsGenetic Engineering

A.To mass-produce proteinsB.To give species new characteristics

Same process – splice to an Agrobacterium plasmid, with tissue-specific promoters.

GenomicsGenetic Engineering

A.To mass-produce proteinsB.To give species new characteristics

Issues: - genetic homogeneity of crop plants - 2011 study – toxin present in 93% of pregnant women in a town in Canada, and increases in immunological responses. - used as feed for animal stock - patterns of use and the evolution of resistance

GenomicsGenetic Engineering

A.To mass-produce proteinsB.To give species new characteristics

Place gene for growth hormone from chinook salmon into Atlantic salmon, next to a constitutive promoter (gene always on, right?)

Grow 10x faster, to same mature size

Models suggest it would outcompete native species if released into the wild

GenomicsGenetic Engineering

A.To mass-produce proteinsB.To give species new characteristicsC.Gene Therapy

- Create a viral vector with a functional human allele – adenosine deaminase - Infect target tissue - Probably need to repeat unless you can transform stem cells

1990-first trial of gene therapy – Ashanti DeSilva.

40 treated since then with 100% efficacy.

Jesse Gelsinger – died in 1999 at age 18, as a consequence of a gene therapy trial involving an adenovirus vector. He has an immunological reaction to the virus and died.

OTC - ornithine transcarbamylase deficiency syndrome.

An X-linked disorder resulting in the inability to bind and convert ammonia to urea. Total loss of this protein is usually fatal shortly after birth.

OTC - ornithine transcarbamylase deficiency syndrome.

“First, although Gelsinger and his family were under the impression that the pre-clinical animal studies had affirmed the trial's safety, two monkeys had actually died. This information appeared on the consent form submitted to the National Institutes of Health review board, but did not appear on the form signed by Jesse.

Moreover, the Penn researchers did not disclose to either the Gelsingers or federal regulators that human volunteers in the same study had suffered adverse reactions - side effects serious enough to have halted the trials had they been reported. Not reporting adverse events in gene therapy clinical trials is clearly wrong, but it seems to have been par for the course in the 1990s: evidence collected shortly after Gelsinger's death showed that fewer than six percent of adverse events associated with gene therapy were properly reported at this time.

Lastly, the lead researcher in the Penn study - James Wilson - did not disclose to the Gelsingers that he was conducting the clinical trial with a private company in which he had a stake. Wilson had a direct financial interest - not merely an academic one - in the trial's successful outcome.” From Center for Genetics and Society - http://www.geneticsandsociety.org/article.php?id=4955

GenomicsGenetic EngineeringBioethics

A.GMO’s – Genetically Modified Organisms

Should the consumer know?If content is < 5%, should it be labeled GMO-free?

Required in Europe and Asia… why not in U.S., which produces 65% of GM food worldwide?

GenomicsGenetic EngineeringBioethics

A.GMO’s – Genetically Modified OrganismsB.Genetic Testing

- 2008 – Genetic Information Nondiscrimination Act

“prohibits the improper use of genetic information in health insurance and employment”

?

GenomicsGenetic EngineeringBioethics

A.GMO’s – Genetically Modified OrganismsB.Genetic Testing

- 2008 – Genetic Information Nondiscrimination Act

“prohibits the improper use of genetic information in health insurance and employment”

Lily Ledbetter

XX XY

?

GenomicsGenetic EngineeringBioethics

A.GMO’s – Genetically Modified OrganismsB.Genetic Testing

- 2008 – Genetic Information Nondiscrimination Act

Lily Ledbetter

XX XY

?

“GINA does not cover an individual's manifested disease or condition--a condition from which an individual is experiencing symptoms, being treated for, or that has been diagnosed.”

Sex discrimination in the workplace was not prohibited under GINA… so “Lily” was needed as a separate “equal pay for equal work” act.

GenomicsGenetic EngineeringBioethics

A.GMO’s – Genetically Modified OrganismsB.Genetic Testing

- GINA - Genetic screening and embryo selection. “Preimplantation Genetic Diagnosis” – used in in vitro fertilization,

screening early embryos for genetic abnormalities… or other traits?

Suppose a child needs a bone marrow transplant… should parents be allowed to select among embryos to make a sibling capable of transfer?

GenomicsGenetic EngineeringBioethics

A.GMO’s – Genetically Modified OrganismsB.Genetic Testing

- GINA - PGD - Germline engineering

GenomicsGenetic EngineeringBioethics

A.GMO’s – Genetically Modified OrganismsB.Genetic Testing

- GINA - PGD - Germline engineering - Enhancement Gene Therapy

Why not insert “better” genes? For Youth? Strength? Health?

GenomicsGenetic EngineeringBioethics

A.GMO’s – Genetically Modified OrganismsB.Genetic Testing

- GINA - PGD - Germline engineering - Enhancement Gene Therapy