comparative genomics gene regulatory networks (grns) anil jegga biomedical informatics contact...

Post on 05-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Comparative GenomicsGene Regulatory Networks (GRNs)

Anil JeggaBiomedical Informatics

Contact Information:Anil JeggaBiomedical InformaticsRoom # 232, S Building 10th FloorCCHMCHomepage: http://anil.cchmc.orgTel: 513-636-0261E-mail: anil.jegga@cchmc.org

Session 2: February 24, 2012

2/24/2012 1Jegga Biomedical Informatics

Additional exercise available at: http://anil.cchmc.org/grn.html

Session 1: Overview of GRNs (Feb 23)a. Computational Approaches b. Cis-Element Identificationc. Comparative Genomicsd. Regulatory region variationse. p53 case study

Session 2: Database Session (Feb 24)a. Genome Browsersb. Promoter Analysis, TFBS Searchc. Co-regulated gene analysis

2/24/2012 2Jegga Biomedical Informatics

Session 2(Databases/Servers)

Feb 24, 2012

a. Genome Browsersb. Promoter Analysis, TFBS

Searchc. Co-regulated gene

analysis

Genome Browser (http://genome.ucsc.edu)

2/24/2012 4Jegga Biomedical Informatics

1 4 5

6

32

Genome Browser (http://genome.ucsc.edu)

Genome Browser Gateway choices:1. Select Clade2. Select genome/species: You can search only one species at a time3. Assembly: the official backbone DNA sequence4. Position: location in the genome to examine or search term (gene symbol, accession

number, etc.)5. Image width: how many pixels in display window; 5000 max6. Configure: make fonts bigger + other options

2/24/2012 5Jegga Biomedical Informatics

Genome Browser (http://genome.ucsc.edu)

2/24/2012 6Jegga Biomedical Informatics

Genome Browser (http://genome.ucsc.edu)Ex

plor

e th

e tr

acks

2/24/2012 7Jegga Biomedical Informatics

2/24/2012 8Jegga Biomedical Informatics

2/24/2012 Jegga Biomedical Informatics 9

What if I want to download promoter sequences for several

genes at a time?

Genome Browser (http://genome.ucsc.edu)

2/24/2012 10Jegga Biomedical Informatics

Genome Browser (http://genome.ucsc.edu)1 2

3

4

5

6

2/24/2012 11Jegga Biomedical Informatics

2/24/2012 12Jegga Biomedical Informatics

2/24/2012 Jegga Biomedical Informatics 13

Other Genome Browsers: ENSEMBLhttp://www.ensembl.org

2/24/2012 Jegga Biomedical Informatics 14

I have a promoter sequence and how do I scan it for known

TFBSs?

2/24/2012 Jegga Biomedical Informatics 15

JASPAR: http://jaspar.genereg.net

2/24/2012 Jegga Biomedical Informatics 16

JASPAR: http://jaspar.genereg.net

2/24/2012 Jegga Biomedical Informatics 17

JASPAR: http://jaspar.genereg.net

2/24/2012 Jegga Biomedical Informatics 18

Gene-Regulation: http://www.gene-regulation.com

Need to have an account (free for academic use)

2/24/2012 Jegga Biomedical Informatics 19

How can I identify putative regulatory regions for a gene or

microRNA?

I have found a miRNA enriched in my gene list or I am interested in a specific gene and I want to identify

putative regulatory regions for miRNA/gene

GenomeTrafac: http://genometrafac.cchmc.org

2/24/2012 20Jegga Biomedical Informatics

GenomeTrafac: http://genometrafac.cchmc.org

2/24/2012 21Jegga Biomedical Informatics

GenomeTrafac: http://genometrafac.cchmc.org

2/24/2012 22Jegga Biomedical Informatics

GenomeTrafac: http://genometrafac.cchmc.org

2/24/2012 23Jegga Biomedical Informatics

GenomeTrafac: http://genometrafac.cchmc.org

2/24/2012 24Jegga Biomedical Informatics

GenomeTrafac: http://genometrafac.cchmc.org

2/24/2012 25Jegga Biomedical Informatics

2/24/2012 Jegga Biomedical Informatics 26

DCODE: http://www.dcode.org/

2/24/2012 Jegga Biomedical Informatics 27

ECR Browser: http://ecrbrowser.dcode.org/

Multispecies (not limited to pairwise comparisons)

2/24/2012 Jegga Biomedical Informatics 28

ECR Browser: http://ecrbrowser.dcode.org/

2/24/2012 Jegga Biomedical Informatics 29

ECR Browser: http://ecrbrowser.dcode.org/

2/24/2012 Jegga Biomedical Informatics 30

ECR Browser: http://ecrbrowser.dcode.org/

2/24/2012 Jegga Biomedical Informatics 31

ECR Browser: http://ecrbrowser.dcode.org/

RESOURCES - URLs: Summary

Application/Resource URLGenome Browser http://genome.ucsc.eduJASPAR http://jaspar.genereg.net/Gene Regulation http://www.gene-regulation.comGenomeTrafac http://genometrafac.cchmc.orgDCODE http://www.dcode.org/

2/24/2012 32Jegga Biomedical Informatics

I have a list of co-expressed mRNAs (Transcriptome)….I want to find the shared cis-elements – Known and Novel

Known transcription factor binding sites (TFBS) Conserved

• oPOSSUM• DiRE

Non-conserved• Pscan• MatInspector (*Licensed)

Unknown TFBS or Novel motifs Conserved

• oPOSSUM• Weeder-H

Non-conserved• MEME• Weeder

1. Each of these applications support different forms of input. Very few support probeset IDs.

2. Red Font: Input sequence required; Do not support gene symbols, gene IDs, or accession numbers. The advantage is you can use them for scanning sequences from any species.

3. *Licensed software: We have access to the licensed version.

1. Each of these applications support different forms of input. Very few support probeset IDs.

2. Red Font: Input sequence required; Do not support gene symbols, gene IDs, or accession numbers. The advantage is you can use them for scanning sequences from any species.

3. *Licensed software: We have access to the licensed version.

2/24/2012 33Jegga Biomedical Informatics

I have a list of co-expressed mRNAs (Transcriptome)….I want to find the shared cis-elements – Known and Novel

Known transcription factor binding sites (TFBS) Conserved

• oPOSSUM• DiRE

Non-conserved• Pscan• MatInspector (*Licensed)

Unknown TFBS or Novel motifs Conserved

• oPOSSUM• Weeder-H

Non-conserved• MEME• Weeder2/24/2012 34Jegga Biomedical Informatics

oPOSSUM (http://burgundy.cmmt.ubc.ca/oPOSSUM/)

Supports human and mouse

2/24/2012 35Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

Disadvantage: Supports either human or mouse only

2/24/2012 36Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

The JASPAR PHYLOFACTS database consists of 174 profiles that were extracted from phylogenetically conserved gene upstream elements. They are a mix of known and as of yet undefined motifs. When should it be used?They are useful when one expects that other factors might determine promoter characteristics and/or tissue specificity.

2/24/2012 37Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

The Z-score statistic reflects the occurrence of the TFBS in the promoters of the co-expressed set compared to background.

The Fisher statistic reflects the proportion of genes that contain the TFBS compared to background.

2/24/2012 38Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

2/24/2012 39Jegga Biomedical Informatics

2/24/2012 40Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

2/24/2012 41Jegga Biomedical Informatics

DiRE (http://dire.dcode.org/)

2/24/2012 42Jegga Biomedical Informatics

DiRE (http://dire.dcode.org/)

ECR-Browser (http://ecrbrowser.dcode.org/)

2/24/2012 43Jegga Biomedical Informatics

Pscan (http://159.149.109.9/pscan)

2/24/2012 44Jegga Biomedical Informatics

Pscan (http://159.149.109.9/pscan)

2/24/2012 45Jegga Biomedical Informatics

Pscan (http://159.149.109.9/pscan)

2/24/2012 46Jegga Biomedical Informatics

Pscan (http://159.149.109.9/pscan)

2/24/2012 47Jegga Biomedical Informatics

Pscan (http://159.149.109.9/pscan)

2/24/2012 48Jegga Biomedical Informatics

Pscan (http://159.149.109.9/pscan) Comparing different input gene sets:1.In the detailed output for a given matrix, you can compare the results obtained with the matrix on the gene set just submitted with the results the matrix had produced on another gene set. The latter could be a "negative" gene set (or vice versa ).2.To perform the comparison, you have to fill in the "Compare with..." box fields with mean, standard deviation and sample size values of the other analysis - for the current one you can find them in the "Sample Data Statistics" box or in the overall text output that can be downloaded from the main output page.3.Warning: Make sure that the values you input are correct, and especially that they were obtained by using the same matrix. Once you have clicked the "Go!" button, an output window will pop up and report if either of the two means is significantly higher than the other, together with a confidence p-value computed with a Welch t-test.

2/24/2012 49Jegga Biomedical Informatics

I have a list of co-expressed mRNAs (Transcriptome)….I want to find the shared cis-elements – Known and Novel

Known transcription factor binding sites (TFBS) Conserved

• oPOSSUM• DiRE

Non-conserved• Pscan• MatInspector (*Licensed)

Unknown TFBS or Novel motifs Conserved

• oPOSSUM• Weeder-H

Non-conserved• MEME• Weeder2/24/2012 50Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

2/24/2012 51Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

The JASPAR PHYLOFACTS database consists of 174 profiles that were extracted from phylogenetically conserved gene upstream elements. They are a mix of known and as of yet undefined motifs. When should it be used?They are useful when one expects that other factors might determine promoter characteristics and/or tissue specificity.

2/24/2012 52Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

2/24/2012 53Jegga Biomedical Informatics

oPOSSUM (http://www.cisreg.ca/oPOSSUM)

2/24/2012 54Jegga Biomedical Informatics

I have a list of co-expressed mRNAs (Transcriptome)….I want to find the shared cis-elements – Known and Novel

Known transcription factor binding sites (TFBS) Conserved

• oPOSSUM• DiRE

Non-conserved• Pscan• MatInspector (*Licensed)

Unknown TFBS or Novel motifs Conserved

• oPOSSUM• Weeder-H

Non-conserved• MEME• Weeder

1. Each of these applications support different forms of input. Very few support probeset IDs.

2. Red Font: Input sequence required; Do not support gene symbols, gene IDs, or accession numbers. The advantage is you can use them for scanning sequences from any species.

3. *Licensed software: We have access to the licensed version.

1. Each of these applications support different forms of input. Very few support probeset IDs.

2. Red Font: Input sequence required; Do not support gene symbols, gene IDs, or accession numbers. The advantage is you can use them for scanning sequences from any species.

3. *Licensed software: We have access to the licensed version.

How to fetch promoter/upstream sequence – single/multiple?

2/24/2012 55Jegga Biomedical Informatics

I have a list of co-expressed mRNAs (Transcriptome)….I want to find the shared cis-elements – Known and Novel

Known transcription factor binding sites (TFBS) Conserved

• oPOSSUM• DiRE

Non-conserved• Pscan• MatInspector (*Licensed)

Unknown TFBS or Novel motifs Conserved

• oPOSSUM• Weeder-H

Non-conserved• MEME• Weeder

1. Each of these applications support different forms of input. Very few support probeset IDs.

2. Red Font: Input sequence required; Do not support gene symbols, gene IDs, or accession numbers. The advantage is you can use them for scanning sequences from any species.

3. *Licensed software: We have access to the licensed version.

1. Each of these applications support different forms of input. Very few support probeset IDs.

2. Red Font: Input sequence required; Do not support gene symbols, gene IDs, or accession numbers. The advantage is you can use them for scanning sequences from any species.

3. *Licensed software: We have access to the licensed version.

Use the fetched promoter/upstream sequences for the following analyses

2/24/2012 56Jegga Biomedical Informatics

WeederH (http://159.149.109.9/pscan)

1. Supports large number of species.

2. Does not support multiple sequences (multifasta) input. You have to enter each sequence separately.

3. Good for small number of sequences where you expect a potential novel (or not included in the TFBS libraries) conserved motif.

2/24/2012 57Jegga Biomedical Informatics

Weeder (http://159.149.109.9/modtools/)

Do not use Groupwise mail when submitting large number of sequences because the results are sent “in the mail” and not as an attachment. And Groupwise mail truncates messages if they are very long. Use Gmail instead. A link to the results page used to be sent earlier.

2/24/2012 58Jegga Biomedical Informatics

Weeder (http://159.149.109.9/modtools/)

2/24/2012 59Jegga Biomedical Informatics

MEME (http://meme.sdsc.edu) MEME takes as input a group of DNA or protein sequences and outputs as many motifs as requested. MEME uses statistical modeling techniques to automatically choose the best width, number of occurrences, and description for each motif.

Your MEME results consist of:• your MEME results in HTML format• your MEME results in XML format• your MEME results in TEXT format• and the MAST results of searching your input sequences for the motifs found by MEME using MAST.

2/24/2012 60Jegga Biomedical Informatics

MEME (http://meme.sdsc.edu)

2/24/2012 61Jegga Biomedical Informatics

TOMTOM can be used to find out if an overrepresented motif in your sequences matches or is similar to a known TFBS

MEME (http://meme.sdsc.edu)

2/24/2012 62Jegga Biomedical Informatics

SummaryCis-Element Finding Matrix

CONSERVED NON-CONSERVEDKNOWN TFBS oPOSSUM

DiREPscanMatInspector*

NOVEL/UNKNOWN TFBS OR MOTIFS

oPOSSUMWEEDER-H

MEMEWEEDER

2/24/2012 63Jegga Biomedical Informatics

RESOURCES - URLs: Summary

Application/Resource URLoPOSSUM http://burgundy.cmmt.ubc.ca/oPOSSUM/DiRE http://dire.dcode.org/Weeder-H http://159.149.109.9/modtools/Weeder http://159.149.109.9/modtools/Pscan http://159.149.109.9/pscanMEME http://meme.sdsc.edu/MatInspector http://www.genomatix.de/Genome Browser http://genome.ucsc.eduECR Browser http://ecrbrowser.dcode.org

2/24/2012 64Jegga Biomedical Informatics

Additional exercise available at: http://anil.cchmc.org/grn.html

top related