biological modeling promoter and gene set analysis

Upload: thamizh555

Post on 03-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Biological Modeling Promoter and Gene Set Analysis

    1/6

    BIOLOGICAL MODELING PROMOTER AND GENE SET ANALYSIS

    A. Whole Genome Gene Expression Data

    B. Gene set enrichment analysis (GSEA)

    AIM: To Identification and retrieve gene expression data (GEO database) for the genes of

    interestand Using GSEA to identify how gene regulation differs between the experimental and

    control conditions.

    Objective:

    Activation of viral pattern sensors, such as RIG-I and MDA5, in dendritic cells triggers a

    transcriptional program that includes the production of Type I interferons (IFNs). These

    antiviral cytokines contribute to both induction and regulation of innate and adaptive antiviral

    mechanisms. IRF family transcription factors play a central role in the extensive genetic

    reprogramming that follows viral infection, and in the production of IFN. Pathogenic viruses

    attempt to subvert normal immune function through the expression of interferon antagonists.

    For example, IRF3 activation and IFN-B expression are blocked by the NS1 protein of

    influenza. In this lab we will investigate the genetic program that operates during the immune

    response to Newcastle disease virus (NDV). NDV is an avian virus that is able to stimulate

    innate immunity and DC maturation, but lacks the ability to evade the human interferon

    response. This response thus models the operation of an uninhibited antiviral regulatory

    network. Genome scale gene expression studies will be accessed. We will see how gene set

    enrichment analysis (GSEA) can be used to infer the important regulatory motifs and gene

    pathways that are unique to a functioning anti viral response. The TRANSFAC database will

    also be accessed to extract position-weighted matrices of relevant TF binding sites on IFNB1

    and then, the promoter sequence will be scanned using the UCSD genome browser

    Procedure 2:

  • 7/29/2019 Biological Modeling Promoter and Gene Set Analysis

    2/6

    Premise: We are interested in finding the transcription factors driving the gene expression

    changes observed post-infection compared to other physical stress (in this instance the

    stress of injection without infective particles).

    Go to PubMed

    Search Keywords: antiviral response cascade

    Look at the paper by Zaslavsky et al, " Antiviral response dictated by choreographed

    cascade of transcription factors." To the right of the abstract you will find a tab All

    links from this paper

    Follow the link GEO DataSets.

    Alternatively you can search the GEO websiteGene Expression Omnibus at NCBI for

    the first authors (Zaslavsky or Hershberg) in the Query Datasets field.

    From this page, information about gene expression can be explored. We would like to

    choose a time point by which we expect some expression changes following infection.

    We also require that both our experiment and our control have samples at this time point.

    Click on the more link following the dataset summary and more on the list of Samples.

    We can now see which time points are available in this data set. Notice that the time point

    6 hours post infection we includes 4 NDV infected samples and 4 controls (AF). We will

    download a text file (.txt extension) of the entire dataset by clicking Series Matrix File(s)

    at the bottom of the page. We will then unzip the file and open it in excel to remove theunneeded data from other time points and to format it for use in the next steps

    Procedure:

    Go to GSEA website. Hit the Downloads link. Login with the email provided by the

    instructor (or create a user, press Launch.

    GSEA works with specific file formats. You must createexpression datasetand

    phenotype labelfiles:

    1. To make expression dataset file open the file you downloaded from GEO. Erase

    all header lines other than !Sample_title. Erase the last row as well. Use the

    information from !Sample_titleto delete all the columns that do not contain time

    point 6 data, then erase this row as well. Change the title of the column with probe

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMedhttp://www.ncbi.nlm.nih.gov/pubmed/20164420http://www.ncbi.nlm.nih.gov/pubmed/20164420http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gds&DbFrom=pubmed&Cmd=Link&LinkName=pubmed_gds&IdsFromResult=20164420http://www.ncbi.nlm.nih.gov/geohttp://www.ncbi.nlm.nih.gov/geohttp://openftp%28%27ftp//ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE18791/')http://www.broadinstitute.org/gsea/http://www.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#Expression_Datasetshttp://www.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#Expression_Datasetshttp://www.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#Expression_Datasetshttp://www.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#Phenotypeshttp://www.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#Phenotypeshttp://www.ncbi.nlm.nih.gov/pubmed/20164420http://www.ncbi.nlm.nih.gov/pubmed/20164420http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gds&DbFrom=pubmed&Cmd=Link&LinkName=pubmed_gds&IdsFromResult=20164420http://www.ncbi.nlm.nih.gov/geohttp://openftp%28%27ftp//ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE18791/')http://www.broadinstitute.org/gsea/http://www.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#Expression_Datasetshttp://www.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#Phenotypeshttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  • 7/29/2019 Biological Modeling Promoter and Gene Set Analysis

    3/6

    set ids to NAME. Insert a column to the right of NAME and title it

    DESCRIPTION. Write na in the first row of DESCRIPTION, copy it, select all

    rows by clicking on the first and last rows while holding down the shift button.

    Paste na in to all rows.

    2. To make the phenotype file open an excel file and make the following 3 lines

    8 2 1

    # AF NDV

    0 1 0 1 0 1 0 1

    Save as a tab delaminated txt file with a .cls extension

    Now that we have our datasets in the correct format, we can load them into GSEA. Go to the

    load data tab and drag&drop the two files you selected into the grey box. Hit "load these files!"

    to load them. If you get an error, look through the previous steps and make sure you formatted

    your two files correctly. Switch to the Run GSEA tab.

    Choose the Expression Dataset you made from the Geo Data.

    Choose gseaftp.broadinstitute.org://pub/gsea/gene_sets/c3.tft.v2.5.symbols.gmt as a Geneset

    database.

    Change number of permutations to 10

    Choose the NDV vs AF phenotype

    Change Permutation type to - gene_set

  • 7/29/2019 Biological Modeling Promoter and Gene Set Analysis

    4/6

    Look in the GEO webpage to determine which chip platform was used in the

    experiment and select the correct Chip Platform. (it should be -

    gseaftp.broadinstitute.org://pub/gsea/annotations/HG_U133_Plus_2.chip)

    Expand Basic Fields

    Change Enrichment statistic to Classic

    Hit Run

    When the running indicator (on the bottom left hand side of the window) changes to

    Success click on it. This should open the details of the analysis in your web browser. In

    some operating systems (for instance Linux) when you hit Success to see the results, youget an error. You just need to go to the default output folder (look for it in Options >

    Preferences) and open the index.html file with a browser.

    Look around a bit try Snapshot and enrichment results in HTML

    In the latter page click on the Details link that belongs to the IRF1 matrix (V$IRF1).

    RETRIEVE AND VERIFY THE PROMOTER REGION AND THE SEQUENCE OF THE

    IFNB1 GENE (NCBI DATABASE)

    In the results of the previous analysis click on enrichment results in HTML

    click on the Details link that belongs to the IRF1 matrix (V$IRF1).

    Near the top of the page you will find IFNB1

    Click the Entrez which will take you to the Entrez database at the relevant page.

    Choose the entry for IFNB1 in humans.

    Follow the linkSee IFNB1 in MapViewer Choose the link dl

    "Strand" box - Select "minus" strand from the pull-down menu because IFNB gene is

    on the minus strand (note how the arrow next to IFNB1 on MapViewer points up).

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=gene&term=IFNB1%5Bsym%5Dhttp://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?direct=on&idtype=gene&id=3456http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=gene&term=IFNB1%5Bsym%5Dhttp://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?direct=on&idtype=gene&id=3456
  • 7/29/2019 Biological Modeling Promoter and Gene Set Analysis

    5/6

    In the "adjust by" box in the "to" line - Increase sequence information displayed by

    inserting +2K", Click Change Region/Strand (note how coordinates change and strand

    designation becomes "-"). Click Display.

    BLASTthe extended sequence against the human genome and verify that it is indeed the

    sequence of IFNB1 (Make sure to include the line that starts with ">")

    Hit the back button in browser copy the sequence into a text editor save as .fasta

    TRANSCRIPTION FACTOR TARGET SITE PREDICTION FOR THE PROMOTER

    REGION OF IFNB1

    JASPAR

    Choose the JASPAR CORE Vertebrata

    Press Toggle so that all JASPAR matrix models are selected.

    Paste the Human IFNB1 gene and promoter in the text area.

    Hit Scan

    Find the target sites for different IRF1 target sites.

    JASPAR attempts to make non redundant target site Matrices so we do not see general IRF

    binding sites. We do find potential target sites for both IRF1 and IRF2.

    USE THE UCSC GENOME BROWSER TO LEARN MORE ABOUT THE IFNB1 GENE

    AND THE IRF1 TF TARGET SITE

    Open the UCSC genome browserand click the "Genomes" link at the top.

    Make sure that the hg16 build is selected in the assembly box (July 2003

    (NCBI34/hg16)), then type IFNB1 in the search box

    Add 2000 bp on both sides of the gene

    Under the regulation section unhide the annotation TFBS Conserved (TFBS =

    Transcription Factor Binding Site).

    Note the TF identified, especially the IRF1/IRF2 binding site at chr9:21068021-

    21068033

    http://www.ncbi.nlm.nih.gov/BLAST/http://jaspar.genereg.net/cgi-bin/jaspar_db.pl?rm=browse&db=core&tax_group=vertebrateshttp://genome.ucsc.edu/http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=158202547&db=hg16&position=chr9%3A21068021-21068033http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=158202547&db=hg16&position=chr9%3A21068021-21068033http://www.ncbi.nlm.nih.gov/BLAST/http://jaspar.genereg.net/cgi-bin/jaspar_db.pl?rm=browse&db=core&tax_group=vertebrateshttp://genome.ucsc.edu/http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=158202547&db=hg16&position=chr9%3A21068021-21068033http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=158202547&db=hg16&position=chr9%3A21068021-21068033
  • 7/29/2019 Biological Modeling Promoter and Gene Set Analysis

    6/6

    Notice how we do not find the same binding sites in both types of analysis. Many fewer

    sites appear in this view and only one of the IRF1 sites identified by JASPER also comes

    up here.

    As an alternative to the above steps, we can find the region we analyzed in step 4 by

    clicking on Blat and copying in the sequence we used in Match and using search. In this

    way we also verify that our identification of IFNB1 in the NCBI database is correct..

    Result:

    The given Procedure was followed to identify the IFNB1 promotor region and gene set

    analysis by various computational methods.

    http://genome.ucsc.edu/cgi-bin/hgBlat?hgsid=158200950http://genome.ucsc.edu/cgi-bin/hgBlat?hgsid=158200950