rna$seq(...

38
RNAseq Gene Expression Analysis Tzu L. Phang Ph.D. Robert Stearman Ph.D. Michael Edwards Ph.D. University of Colorado Denver 2014 AACR Workshop, Snowmass CO

Upload: others

Post on 01-Feb-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

  • RNA-‐seq  Gene  Expression  Analysis  

    Tzu  L.  Phang  Ph.D.  Robert  Stearman  Ph.D.  Michael  Edwards  Ph.D.  

    University  of  Colorado  Denver  2014  AACR  Workshop,  Snowmass  CO  

  • What  is  gene  expression?  

    “Gene  expression  is  the  process  by  which  informaSon  from  a  gene  is  used  in  the  synthesis  of  a  funcSonal  gene  product”  

  • The Central Dogma

    Transcriptome Proteomic

  • Agenda  

    •  Import  BAM  file  •  What  is  Next  GeneraSon  Sequencing  (NGS)?  •  NGS  Usages:  RNA-‐seq  and  ChIP-‐seq  •  NGS  File  Format  and  Size  •  Mapping  Phred  Quality  Score  •  Demo:  Using  SeqMonk  for  RNA-‐seq  and  ChIP-‐seq  Analysis  

  • Installing  SeqMonk  

    h]p://www.bioinformaScs.babraham.ac.uk/projects/seqmonk/  

  • SeqMonk:  First  Look  

  • List  Panel  

    Chromosome  Panel  

    Track  Panel  

    Quick  Assess  Panel  

  • Sample  BAM  Files  •  ABC_DHL2.bam  •  ABC_Ly10.bam  •  ABC_Ly3.bam  •  ABC_U2932.bam  •  GCB_DHL10.bam  •  GCB_DHL4.bam  •  GCB_DHL6.bam  •  GCB_Ly7.bam  •  STAT3  ChIPSeq  Genes.txt  

  • Import  BAM  files  

  • Next  GeneraSon  Sequencing  

    h]p://www.youtube.com/watch?v=77r5p8IBwJk  

  • How RNA-seq works

    Figure  from  Wang  et.  al,  RNA-‐Seq:  a  revolu=onary  tool  for  transcriptomics,  Nat.  Rev.  GeneScs  10,  57-‐63,  2009).  

    Next  generaSon  sequencing  (NGS)  

    Sample  preparaSon  

  • How ChIP-seq works

  • File  System  

  • unknown:5:1:2:836#0/1:CATACAAGTTGTTTGTACTATAGNTGTTTTTGAATT:aabaaaa^abaaba^_]_aaaXPD\^_aaa`Y]_aa!unknown:5:1:2:717#0/1:TCTGTTCCAGATTCTAAGGGCATNGTCTTTTTGAAT:aa^]]`\_^[Y_`^aZP^VZV[SDLZ^aa__^^\Ya!unknown:5:1:2:188#0/1:TAAGAAGAAAGATGCATAGGTACNATATTTTTGAAT:a``Z[^Y^`\\\^[\^][WNTWNDS_[^_^^[OWY_!unknown:5:1:2:1262#0/1:CACTTACAAACAAGGAATGTTGGNCGGTTTTTGAAT:a`ababaabaaaa_``aa``_ULDXZ_^aaa`O_aa!unknown:5:1:2:1046#0/1:CTAAGATGGCCTAAGAGTAGACTNACTTTTTTGAAT:abb`Xa`Z_aabaaa`]__Z^`\D\`aaaaaa^aab!

    !

    @ILLUMINA-‐545855_0001:4:100:743:1210#0  TAACATGTGTCATATGTCCCAGGATGTC  +ILLUMINA-‐545855_0001:4:100:743:1210#0  ab^aaaa_a_aaaa`a^abaaaa``a_a  

    Data Structure

    FASTQ format

  • Quality  Control  of  sequences  

    •  The  quality  scores  are  the  only  measure  of  confidence  

    •  QualiSes  usually  fall  with  length  where  trimming  is  needed  to  remove  

  • Phred  QualiSes  

    •  Developed  by  Phil  Green’s  group  at  the  University  of  Washington  in  the  1990’s  

    •  AutomaScally  processes  sequence  chromatogram  files  – Reports  sequence  and  associated  qualiSes  –  Introduced  concept  of  phred  quality  values  

  • James  H.  Thomas,  University  of  Washington  

  • James  H.  Thomas,  University  of  Washington  

  • FASTQ  QC  VisualizaSon  Per  base  sequence  quality  

    h]p://www.bioinformaScs.babraham.ac.uk/projects/fastqc/  

  • Our  Dataset  

  • RNA-‐seq  Analysis  Workflow  

  • From  FASTQ  to  SAM/BAM  

  • galaxyproject.org  

  • h]ps://usegalaxy.org  

  • File  Format  

    SCARF

    FASTQ

    SAM

    BAM

    VCF

    GTF

    BED WIG

    Single-End

    Paired-End

    PILEUP

    Mapping

    Input for Visualization

    Tools

    QC Visualization

    5.2 GB

    3.3 GB

    4.5 GB

    696 MB

  • NOW  A  DEMONSTRATION  

  • Arer  BAM  Import  

  • RNA-‐seq  Analysis  Workflow  

  • Define  Probes  &  QuanStaSon  

  • RNA-‐seq  Analysis  Workflow  

  • Why  normalizaSon?  

    •  Remove systematic errors introduced in labeling, hybridization and scanning procedures

    •  Correct these errors while preserve biological variability / information

  • A  different  look  …  Technical  rep

    licate  diffe

    rence  

    Average  Intensity  Values  

  • To  normalize  or  not  to  …  

  • ChIP-‐seq  Analysis  Strategy  

  • ChIP-‐seq  Analysis  Caveats  

  • ChIP-‐seq  Analysis  Workflow