modeling functional genomics datasets cvm8890-101 lesson 1 13 june 2007bindu nanduri

43
Modeling Functional Modeling Functional Genomics Datasets Genomics Datasets CVM8890-101 CVM8890-101 Lesson 1 Lesson 1 13 June 2007 13 June 2007 Bindu Nanduri Bindu Nanduri

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Modeling Functional Genomics Modeling Functional Genomics DatasetsDatasets

CVM8890-101CVM8890-101Lesson 1Lesson 1

13 June 200713 June 2007 Bindu NanduriBindu Nanduri

Page 2: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Lesson 1: Data to Biological Lesson 1: Data to Biological sense. What we are trying to sense. What we are trying to

achieve. Introduction to achieve. Introduction to functional genomics modeling functional genomics modeling

strategies.strategies.

Page 3: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Transcriptomics and Proteomics

Page 4: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Why study gene expression Why study gene expression changes????? changes?????

Transcription is predominant form of Transcription is predominant form of regulationregulation

Page 5: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Northern Blots Northern Blots

Mol Vis. 1996 Nov 4;2:11

Page 6: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Basic conceptBasic concept::Reverse Northern blot on a large scaleReverse Northern blot on a large scale

High throughputHigh throughput::hybridize control and experimental samples hybridize control and experimental samples

simultaneously using distinct fluorescent dyessimultaneously using distinct fluorescent dyes

many assays can be carried out in parallelmany assays can be carried out in parallel

MicroarraysMicroarrays

Page 7: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Affymetrix oligo arrays design

(11 to 16)

Usually the most 3 prime area, often UTR

25mer

25mer25mer

AAAA..

25mer25mer

http://www.affymetrix.comhttp://www.affymetrix.com

Page 8: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Genomic Tiling Array DesignGenomic Tiling Array Design

GenomeSequence

Multiple probes

5´ 3´

Center-Center Resolution 38 bp

Page 9: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

ISB Systems Biology Course 2006

Page 10: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Is mRNA level = Protein level?Is mRNA level = Protein level?

Is there a correlation???Is there a correlation???

Comparison of protein levels (MS, 2D gels) and RNA levels Comparison of protein levels (MS, 2D gels) and RNA levels (SAGE) for 156 genes in yeast(SAGE) for 156 genes in yeast

mRNA levelsmRNA levels unchangedunchanged, but , but protein levels varied by up to protein levels varied by up to 20X20X

protein levelsprotein levels unchangedunchanged, but , but mRNA levelsmRNA levels varied by up to varied by up to 30X30X

Highly expressed mRNAs correlate well with protein levelsHighly expressed mRNAs correlate well with protein levels

Gygi et al. (1999) Gygi et al. (1999) Mol. Cell. BiolMol. Cell. Biol..

Page 11: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

ISB Systems Biology Course 2006

Page 12: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

ISB Systems Biology Course 2006

Page 13: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

ISB Systems Biology Course 2006

Page 14: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

ISB Systems Biology Course 2006

Page 15: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

ISB Systems Biology Course 2006

Page 16: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

ISB Systems Biology Course 2006

Page 17: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Expressed Sequence TagsExpressed Sequence Tags

ESTsESTs…pieces of DNA sequence (usually …pieces of DNA sequence (usually 200 to 500200 to 500 ntnt) generated ) generated by sequencing either one or both ends of an expressed geneby sequencing either one or both ends of an expressed gene

Bits of DNA that represent genes expressed in certain cells, Bits of DNA that represent genes expressed in certain cells, tissues, or organs from different organisms and tissues, or organs from different organisms and

Can be Can be useful "useful "tagstags" to fish a gene" to fish a gene out of a portion of out of a portion of chromosomal DNA by matching base pairschromosomal DNA by matching base pairs

http://www.ncbi.nlm.nih.gov/About/primer/est.html

Page 18: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 19: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

http://www.ncbi.nlm.nih.gov/About/primer/est.html

EST Sequence Clustering EST Sequence Clustering Gene can be expressed as mRNA many,many times, ESTs Gene can be expressed as mRNA many,many times, ESTs

derived from this mRNA may be derived from this mRNA may be redundantredundant

many identical, or similar, copies of the same ESTmany identical, or similar, copies of the same EST

redundancy and overlap means that when someone searches redundancy and overlap means that when someone searches dbEST for a particular EST, they may retrieve a long list dbEST for a particular EST, they may retrieve a long list of tags, many of which may represent the same geneof tags, many of which may represent the same gene

UniGeneUniGene database automatically partitions GenBank database automatically partitions GenBank sequences into a non-redundant set of gene-oriented sequences into a non-redundant set of gene-oriented clustersclusters

Page 20: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

ESTs: ESTs: EST mapping to the genome, annotationEST mapping to the genome, annotation

differential expressiondifferential expression

Transcriptome:Transcriptome: Clustering, differential Clustering, differential expression analysis expression analysis

Proteome:Proteome: differential expression analysis differential expression analysis

Page 21: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Multiple data analysis platforms Multiple data analysis platforms

Proteomics

Transcriptomics

EST analysis

LIST of elements

Page 22: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 23: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 24: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 25: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 26: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 27: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Modeling FunctionModeling Function

Modeling function requires:Modeling function requires:

knowing the components of the systemknowing the components of the system

((structural annotationstructural annotation))

knowing what these components do & how they knowing what these components do & how they interactinteract

((functional annotationfunctional annotation))

Page 28: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 29: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

http://www.protonet.cs.huji.ac.il/ProToGO/Introduction.html

Page 30: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Where do you begin????Where do you begin????SpecificsSpecifics

Page 31: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Transcriptome AnalysisTranscriptome Analysis

Page 32: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Similar expression patterns = similar regulation?

ClusteringClustering

clustering algorithms help us identify patterns in complex data

Key Goal: identify co-regulated groups of genes

Hierarchical clusteringK-means clusteringSelf organizing feature mapsPrincipal component analysis

Page 33: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

QualitativeQualitative : total number of identified proteins : total number of identified proteins data intersections data intersections

QuantitativeQuantitative: changes in protein expression: changes in protein expression

ProteomicsProteomics

Page 34: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 35: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Proteomic data analysis tools

Page 36: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 37: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
Page 38: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Use GO for…….Use GO for…….

Grouping gene products by biological functionGrouping gene products by biological function

Determining which classes of gene products are over-Determining which classes of gene products are over-represented or under-representedrepresented or under-represented

Focusing on particular biological pathways and Focusing on particular biological pathways and functions (functions (hypothesis-driven data interrogationhypothesis-driven data interrogation) )

Relating a protein’s location to its functionRelating a protein’s location to its function

Page 39: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Course OverviewCourse Overview

Introduction to functional annotationIntroduction to functional annotation. . OrthologsOrthologs and homologs; and homologs; clusters of orthologous genes (COGs) and the gene ontology clusters of orthologous genes (COGs) and the gene ontology (GO); and how to find what functional annotation is available(GO); and how to find what functional annotation is available

Tools for functional annotationTools for functional annotation. Accessing functional data; . Accessing functional data;

computational strategies to obtain more complete functional computational strategies to obtain more complete functional annotation; the AgBase GO annotation pipeline.annotation; the AgBase GO annotation pipeline.

Introduction to pathways analysisIntroduction to pathways analysis. Theory and strategies for . Theory and strategies for

pathway analysis modeling in different species and tools for pathway analysis modeling in different species and tools for pathway analysis. pathway analysis.

Functional genomics modelingFunctional genomics modeling : prokaryotic and eukaryotic : prokaryotic and eukaryotic examples examples

Page 40: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Some Useful LinksSome Useful Links

http://www.genomesonline.org/ (comprehensive access to information http://www.genomesonline.org/ (comprehensive access to information regarding complete and ongoing genome projects around the world.)regarding complete and ongoing genome projects around the world.)http://www.geneontology.org/ (provides a controlled vocabulary to describe http://www.geneontology.org/ (provides a controlled vocabulary to describe gene and gene product attributes in any organism)gene and gene product attributes in any organism)http://pir.georgetown.edu/ (integrated protein informatics resource for http://pir.georgetown.edu/ (integrated protein informatics resource for genomics and proteomics)genomics and proteomics)http://www.pir.uniprot.org/ (protein database)http://www.pir.uniprot.org/ (protein database)http://mips.gsf.de/ (maintains a set of generic databases as well as the http://mips.gsf.de/ (maintains a set of generic databases as well as the systematic comparative analysis of microbial, fungal, and plant genomes.)systematic comparative analysis of microbial, fungal, and plant genomes.)http://www.ncbi.nlm.nih.gov/ (comprehensive resource for public databases, http://www.ncbi.nlm.nih.gov/ (comprehensive resource for public databases, literature and tools)literature and tools)http://www.ebi.ac.uk/ensembl/ (System that maintains automatic annotation http://www.ebi.ac.uk/ensembl/ (System that maintains automatic annotation of large eukaryotic genomes)of large eukaryotic genomes)http://expasy.org/ (expert protein analysis system)http://expasy.org/ (expert protein analysis system)http://www.biocyc.org/ (BioCyc is a collection of 260 Pathway/Genome http://www.biocyc.org/ (BioCyc is a collection of 260 Pathway/Genome Databases: metabolic pathways)Databases: metabolic pathways)http://www.genome.jp/kegg/ (biological systems" database integrating both http://www.genome.jp/kegg/ (biological systems" database integrating both molecular building block information and higher-level systemic information)molecular building block information and higher-level systemic information)

Page 41: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Some Useful LinksSome Useful Linkshttp://pfgrc.tigr.org/index.shtml (functional genomics studies on a variety of http://pfgrc.tigr.org/index.shtml (functional genomics studies on a variety of pathogens for which genomic sequence information is currently, or will soon pathogens for which genomic sequence information is currently, or will soon be, available)be, available)http://www.tigr.org/ (comprehensive resource for microbial genomics)http://www.tigr.org/ (comprehensive resource for microbial genomics)http://www.cs.ualberta.ca/~bioinfo/PA/ (High throughput proteome http://www.cs.ualberta.ca/~bioinfo/PA/ (High throughput proteome annotations)annotations)http://garnet.arabidopsis.org.uk/systems_biology_tools.htm (Arabidopsis http://garnet.arabidopsis.org.uk/systems_biology_tools.htm (Arabidopsis resources)resources)http://www.systems-biology.org/002/ (systems biology portal)http://www.systems-biology.org/002/ (systems biology portal)http://www.ebi.ac.uk/biomodels/ (mathematical models of biological http://www.ebi.ac.uk/biomodels/ (mathematical models of biological interests)interests)http://www.genmapp.org/current_databases.html (species-specific http://www.genmapp.org/current_databases.html (species-specific collections of genes and annotation)collections of genes and annotation)http://bioinfo.bgu.ac.il/bsu/microarrays/links/ (Microarray analysis resources)http://bioinfo.bgu.ac.il/bsu/microarrays/links/ (Microarray analysis resources)http://david.abcc.ncifcrf.gov/ (Database for Annotation, Visualization and http://david.abcc.ncifcrf.gov/ (Database for Annotation, Visualization and Integrated Discovery)Integrated Discovery)http://www.animalgenome.org/pigs/community/links.html (swine genetics http://www.animalgenome.org/pigs/community/links.html (swine genetics community)community)

Page 42: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Some Useful LinksSome Useful Linkshttp://www.biocarta.com/FeaturedProducts/index.asp (pathways and tools http://www.biocarta.com/FeaturedProducts/index.asp (pathways and tools for analysis)for analysis)http://www.genecards.org/index.shtml (database of human genes that http://www.genecards.org/index.shtml (database of human genes that includes automatically-mined genomic, proteomic and transcriptomic includes automatically-mined genomic, proteomic and transcriptomic information, as well as orthologies, disease relationships, SNPs, gene information, as well as orthologies, disease relationships, SNPs, gene expression, gene function, and service links for ordering assays and expression, gene function, and service links for ordering assays and antibodies)antibodies)http://www.proteomecommons.org/ (proteomics tools)http://www.proteomecommons.org/ (proteomics tools)http://harvester.embl.de/http://harvester.embl.de/http://bioinformatics.org/ (open access institute)http://bioinformatics.org/ (open access institute)http://www.ihop-net.org/UniPub/iHOP/ (A network of genes and proteins http://www.ihop-net.org/UniPub/iHOP/ (A network of genes and proteins extends through the scientific literature)extends through the scientific literature)http://www1.jcsg.org/psat/help/document.html (comparative analysis of http://www1.jcsg.org/psat/help/document.html (comparative analysis of protein sequence)protein sequence)http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi (genome-scale http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi (genome-scale algorithm for grouping ortholog protein sequences)algorithm for grouping ortholog protein sequences)http://www.pathogenomics.ca/ortholuge/ (ortholog prediction program)http://www.pathogenomics.ca/ortholuge/ (ortholog prediction program)http://www.gene-regulation.com/pub/databases.html (transcription factor http://www.gene-regulation.com/pub/databases.html (transcription factor database)database)

Page 43: Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri

Some Useful LinksSome Useful Linkshttp://www.reactome.org/ (curated knowledgebase of biological http://www.reactome.org/ (curated knowledgebase of biological pathways)pathways)http://www.biochemweb.org/systems.shtml(The Virtual Library http://www.biochemweb.org/systems.shtml(The Virtual Library of Biochemistry,Moleculer Biology and Cell Biology)of Biochemistry,Moleculer Biology and Cell Biology)http://genome-www.stanford.edu/ (Stanford genomic resources)http://genome-www.stanford.edu/ (Stanford genomic resources)http://www.softberry.com/berry.phtml (collection of tools for http://www.softberry.com/berry.phtml (collection of tools for annotation and analysis of sequences)annotation and analysis of sequences)http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0E.html http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0E.html (prediction of transmembrane domains in proteins)(prediction of transmembrane domains in proteins)http://www.psort.org/psortb/ (subcellular localization http://www.psort.org/psortb/ (subcellular localization predictions)predictions)http://www.ch.embnet.org/software/TMPRED_form.html http://www.ch.embnet.org/software/TMPRED_form.html (prediction of membrane-spanning regions and their orientation)(prediction of membrane-spanning regions and their orientation)http://www.agbase.msstate.edu/ (functional analysis of http://www.agbase.msstate.edu/ (functional analysis of agricultural plant and animal gene products)agricultural plant and animal gene products)