wings2014 workshop 1 design, sequence, align, count, visualize

88
Workshops in nextgenera1on science at UNC Charlo7e 2014 Workshop 1 Design, sequence, align, count, visualize 1

Upload: ann-loraine

Post on 28-Aug-2014

85 views

Category:

Data & Analytics


1 download

DESCRIPTION

Slides from Workshop 1 of wings 2014

TRANSCRIPT

Page 1: wings2014 Workshop 1 Design, sequence, align, count, visualize

Workshops  in  next-­‐genera1on  science  at  UNC  Charlo7e  2014  

Workshop  1  -­‐  Design,  sequence,  align,  count,  visualize  

1  

Page 2: wings2014 Workshop 1 Design, sequence, align, count, visualize

Workshop  Loca1ons  

•  Sec$on  1  -­‐  Room  801    – Ann  Loraine,  UNC  Charlo7e  – Naim  Matasci,  University  of  Arizona,  iPlant  

•  Sec$on  2  -­‐  Room  802  –  Ivory  Clabaugh  Blakley,  UNC  Charlo7e  – Xiangqin  Cui,  University  of  Alabama  Birmingham  

•  Please  stay  in  your  sec$on  – Cover  same  material,  but  1ming  may  vary  

2  

Page 3: wings2014 Workshop 1 Design, sequence, align, count, visualize

Meet  your  TAs  

•  Graduate  students  from  UNCC  Dept  of  Bioinforma1cs  and  Genomics  –  801  Roshonda  Barner,  Ibro  Mujacic,  Chi-­‐Yu  "Jack"  Yen,  Warren  (G.)  Cole,  Tony  Dao,  Greg  Linchango,  Sushma  Madamanchi,  Anuja  Jain  

–  802  Richard  Linchangco,    Fred  Lin,  Chris  Ball,  Lu  Tian,  Shawn  Chaffin,  Natascha  Moestl,  Walter  Clemens,  Adriano  Schneider  

•  Loraine  Lab  members  –  801  Kyle  Su7lemyre  (IGB  support),  April  Estrada  (Research  Specialist,  Expert  IGB  User)  

–  802  David  Norris  (IGB  Developer)  

3  

Page 4: wings2014 Workshop 1 Design, sequence, align, count, visualize

Schedule  

•  Workshop  1  -­‐  planning  an  experiment,  data  processing,  visualiza1on  – 9:00  to  11:30,  then  Lunch  

•  Workshop  2  -­‐  introduc1on  to  R  &  RStudio  for  data  analysis,  differen1al  expression  – 12:30  to  2:30,  then  a  30'  Break  

•  Workshop  3  -­‐  biological  interpreta1on  using  pathway  tools,  Gene  Ontology,  the  Web  – 3:00  to    5:00,  then  Done  

4  

Page 5: wings2014 Workshop 1 Design, sequence, align, count, visualize

Using  RNA-­‐Seq  data  set  for  WiNGS2014    

5  

pollennetwork.org  

•  Sponsored  by  Pollen  Research  Coordina1on  Network  in  Integra1ve  Pollen  Biology  (annual  mee1ng  starts  tonite)    

•  Visit  Web  site  for  more  info  

Page 6: wings2014 Workshop 1 Design, sequence, align, count, visualize

RNA-­‐Seq  data  set  for  the  workshop  

•  Goal:  Provide  resources  for  pollen  biology  –  Example  RNA-­‐Seq  data  analysis  –  Catalog  of  genes  expressed  in  pollen  –  Highlight  important  area  of  pollen  research  

•  Problem:  Pollen  in  some  plant  species  is  vulnerable  to  heat  stress,  reduces  yields  –  Exposure  to  mild  heat  stress  (acclima$on)  can  protect  against  more  severe  stress  later  -­‐  called  acquired  thermotolerance  (Firon  2012)  

•  To  learn  more,  we  sequenced  RNA  extracted  from  pollen  undergoing  a  mild  heat  stress  –  Same  temperature  that  can  establish  thermotolerance    

6  

Page 7: wings2014 Workshop 1 Design, sequence, align, count, visualize

Samples  from  the  lab  of  Nurit  Firon,  Volcani  Ins1tute,  Israel  

•  Firon  lab  studies  effects  of  heat  stress  on  tomato  pollen  

•  Showed  (along  with  others)  that  high  temp.  reduces  pollen  viability,  sugar  content    

•  Studying  a  heat-­‐tolerant  tomato  cul1var:  Hazera  3042  – Pollen  is  sensi1ve  to  heat  stress  but  not  as  much  as  other  varie1es  

7  

Page 8: wings2014 Workshop 1 Design, sequence, align, count, visualize

Nurit's  experiment:  RNA-­‐Seq  of  heat-­‐tolerant  tomato  cul1var  Hazera  3042  •  Collected  pollen  from  plants  growing  in  temperature-­‐controlled  greenhouses  –  Control  25/18°  C  op$mal  temperature  –  Treatment  32/26°  C  mild  chronic  heat  stress    

•  Collected  batches  of  pollen  from  ~  10  plants  during  Sep.  &  Oct  2013    – One  treatment,  one  control  per  collec1on  – Made  RNA  from  five  collec1ons,  5  treatment,  5  control  "batches"  

–   sequenced  at  UCLA  (69  base,  PE)  

8  

Page 9: wings2014 Workshop 1 Design, sequence, align, count, visualize

Arabidopsis  cold  stress  RNA-­‐Seq    

•  Simpler  data  set  with  one  treatment  &  control  – Using  data  from  part  of  chr1,  treatment  sample  to  illustrate  data  processing,  visualiza1on,  effects  of  parameter  seongs  on  results  (maximum  intron  size  in  tophat  spliced  alignment  program)  

•  For  details,  see:    –  experiment  record  at  the  Short  Read  Archiveh7p://www.ncbi.nlm.nih.gov/sra/SRP029896    

–  sample  h7p://www.ncbi.nlm.nih.gov/sra/SRX348640    •  Published  in  Methods  in  Molecular  Biology  h7p://www.ncbi.nlm.nih.gov/pubmed/24792048  

 9  

Page 10: wings2014 Workshop 1 Design, sequence, align, count, visualize

Workshop  1:  RNA-­‐Seq:  Design,  sequence,  align,  count,  visualize  

wings  2014  

10  10  

Page 11: wings2014 Workshop 1 Design, sequence, align, count, visualize

Goals  •  Learn  the  basics  (20')  – Plan  an  experiment  – Library  prep  for  RNA-­‐Seq  –  Illumina  sequencing  

•  Prac1ce:  Quality  analysis  using  FastQC  (30')  •  Prac1ce:  Data  processing  (30')  – Align  reads  (make  BAM  files  and  junc1on  files)    – Make  counts  files  for  sta1s1cal  analysis  – Merge  reads  into  transcript  models  w/  Cufflinks    

•  Prac1ce:  Visualize  results  in  IGB  (60')  – Compare  to  data  set  in  Galaxy,  TAIR10  gene  models  

11  

Page 12: wings2014 Workshop 1 Design, sequence, align, count, visualize

Visualiza1on  using  IGB  

FASTQ  files  

WildType1a.fastq

Work  Shop  2  

Workshop  1  Overview   FASTQC  

Alignment  onto  Genome  

$Command Line…

WildType1a.bam

Genera1on  of  Counts  Data  

Counts.txt

Sequencing  Strategy  

Page 13: wings2014 Workshop 1 Design, sequence, align, count, visualize

RNA-­‐seq:  ultra-­‐high  throughput  cDNA  sequencing  

•  Several  papers  published  in  2008,  first  in  May    

13  h7p://blog.sbgenomics.com/rna-­‐seq-­‐the-­‐first-­‐wave/  

Ecker  lab  

Snyder  lab  

999  cites  

1,076  cites  

Page 14: wings2014 Workshop 1 Design, sequence, align, count, visualize

Mortazavi  2008  "Mapping  and  quan1fying  mammalian  transcriptomes  

by  RNA-­‐Seq"  Nature  Methods    

•  Published  later  in  2008,  but  >  3000  cita1ons  

•   Why?  Maybe  because  emphasized  RNA-­‐Seq  as    replacement  for  expression  DNA  microarrays  

•  Comment  in  same  issue:  "Beginning  of  the  end  for  microarrays?"  

 14  

google  scholar  

Page 15: wings2014 Workshop 1 Design, sequence, align, count, visualize

RNA-Seq Overview - Illumina  

~  ~  ~  ~  fragment  

synthesize cDNA (random hexamers)   -  -  -  -  -  -  -  -  

-  -  -  -  -  -  -  

-  -  -  -  -  -  -  -  -  -  -  

-  -  -  -  repair ends  

add “A” bases to 3’ ends  

ligate adapters  

extract RNA, purify polyA+  

-  -  -  -  -  -   -  -  -  -   -  

amplify  

library reflects RNA from original sample  

Data, fastq sequence files Millions of reads per library  

Map to genome Count reads per gene  

improve gene models  

identify differentially expressed genes  

alignments  

analyze splicing  

and much more..  

prepare flowcell  

Plan experiment •  Biological replication •  Sequencing strategy •  Data analysis strategy  

sequence by synthesis  

collect samples  

2. Making Libraries  

quality assessment

3. Sequencing  

4. Data Analysis  

1. Design  

15  

Page 16: wings2014 Workshop 1 Design, sequence, align, count, visualize

Five  steps  for  design  

1.  Ar1culate  your  ques$ons  or  hypothesis    2.  Define  your  unit  of  biological  replica1on.  3.  Write  up  your  sample  collec1on  protocol  in  

detail  –  Does  the  protocol  allow  you  to  test  your  hypothesis?    

4.  Define  library  synthesis  &  sequencing  strategy  –  Read  lengths,  paired  end  vs.  single  end,  depth,  barcoding  

5.  Ask  an  experienced  data  analyst  to  review  your  plan,  revise  needed  

16  

Page 17: wings2014 Workshop 1 Design, sequence, align, count, visualize

Image:    David  C  Corney  Ph.  D.    h7p://www.labome.com/method/RNA-­‐seq-­‐Using-­‐Next-­‐Genera1on-­‐Sequencing.html  

Fork  or  "Y"  adapters  size  selec1on  

Library  synthesis    

17  

Y  adapters  contain  indexes,  

allow  mul1plexing  

Page 18: wings2014 Workshop 1 Design, sequence, align, count, visualize

Example  library  molecule    

Unknown  sequence  Rd1  

Rd2  

barcode  

Universal  adapter    

Index  Primer  

18  

Rd1  Rd2  

Rd1  &  Rd  2  are  from  reverse  complements,  might  overlap.    Ref:  h7p://nextgen.mgh.harvard.edu/IlluminaChemistry.html  

P5   P7  

Page 19: wings2014 Workshop 1 Design, sequence, align, count, visualize

Flow  cell  prepara1on  &  sequencing  by  synthesis  

19  

h7ps://www.youtube.com/watch?v=HMyCqWhwB8E    

Page 20: wings2014 Workshop 1 Design, sequence, align, count, visualize

Review:  Paired  End  vs  Single  End  •  Single  End  –  cheaper  •  Paired  End  –  more  expensive  –  two  reads  per  fragment  – coun1ng  fragments,  not  reads    

– call  normalized  counts  FPKM  not  RPKM  sequenced  in  SE  

Sequenced  in  PE  

SE  

PE  indexed  adapter  

20  

Page 21: wings2014 Workshop 1 Design, sequence, align, count, visualize

Get  the  reads  in  a  FASTQ  file  •  File  contains  millions  of  records  – Each  record  has  four  lines,  represents  ONE  sequence  

•  Line  1  –  the  name,  starts  with  @  •  Line  2  –  the  sequence,  starts  at  new  line  

•  Line  3  –  some  other  stuff,  op1onal,  starts  with  +  •  Line  4  –  the  quality  scores,  starts  at  new  line  

@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!CCTAAATGGTGCCATGCTAGGAGGCCGTGCCCTTCTTGAAAAGTTGTATGTGAA!+!BBBFFFFFFBFFFIIIIFI<FFIIIIIFIIIIFBFIIIIIIIIFFFIIIIFIII!

base  =  T  score  =  F  =  37  

21  

Page 22: wings2014 Workshop 1 Design, sequence, align, count, visualize

Phred  Quality  score  Q  

h7p://en.wikipedia.org/wiki/FASTQ_format  

Describes  how  exponen1ally  unlikely  it  is  that  a  given  base  call  is  wrong.  

Q  =  -­‐10  log10  pe    

22  

Page 23: wings2014 Workshop 1 Design, sequence, align, count, visualize

h7p://drive5.com/usearch/manual/quality_score.html  

Different  Illumina  data  processing  pipelines  used  different  score  encodings  

23  

Page 24: wings2014 Workshop 1 Design, sequence, align, count, visualize

Get  two  files  -­‐  Read1  &  Read2  -­‐  from  paired  end  sequencing  

•  Read1  and  Read2  have  same  read  iden$fier,  are  reverse  complements  of  the  same  fragment    

•  Example  is  processing  pipeline  Cassava  1.8,  older  versions  used  different  naming  conven1ons  

@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!CCTAAATGGTGCCATGCTAGGAGGCCGTGCCCTTCTTGAAAAGTTGTATGTGAA!+!BBBFFFFFFBFFFIIIIFI<FFIIIIIFIIIIFBFIIIIIIIIFFFIIIIFIII!

@SN1083:379:H8VA1ADXX:2:1101:1248:2144 2:N:0:12!CATTTTCGACGTTGTTAATAAGCTCTGCGTACTTGCAAGCTATCTGCGCGAACG!+!BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIFFF!

24  

R1  

R2  

Page 25: wings2014 Workshop 1 Design, sequence, align, count, visualize

Sequence  iden1fier  line  in  Cassava  1.8    

25  

@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!

machine    run#    flow-­‐cell-­‐id    lane      1le    x-­‐pos    y-­‐pos  

read#                            index      is-­‐filtered            (barcode)                            control  

Page 26: wings2014 Workshop 1 Design, sequence, align, count, visualize

FastQC  

•  Many  groups  use  FastQC  as  a  first  pass  quality  assessment  

•  Free  from  Babraham  h7p://www.bioinforma1cs.babraham.ac.uk/projects/fastqc/  

•  Run  interac1vely  (point-­‐and-­‐click)  or  command  line  (won’t  cover  this)  

26  

Page 27: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Using  FastQC    

•  Go  to  Conference  DropBox  link:    –  h7p://bitly.com/rnaseq2014  

•  Note  two  folders  –  FastQC  and  FastQC-­‐Examples  –  FastQC-­‐Examples  has  FastqQC  reports  from  different  species,  sample  types  (next  slide)  

•  FastQC  folder,  download  –  Example.fastq  –  FastQC_Manual.pdf  

•  Start  FastQC,  open  Example.fastq  

27  

Page 28: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Watch  FastQC  video  

•  h7ps://www.youtube.com/watch?v=bz93ReOv87Y  (start  around  34  sec)  

•  Take-­‐home  #1:  FastQC  assesses  whether  your  data  files  are  typical  

•  Take-­‐home  #2:  A  "bad  result"  from  FastQC  doesn't  always  mean  your  data  are  not  useful  or  valuable  

•  Explore  on  your  own!  (~  15  minutes)  

28  

Page 29: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  View  reports  in  Fastqc-­‐Examples  (~  15  min)      

•  Blueberry    – OnealRipe_1    – OzarkblueGreen_1  

•   Tomato  pollen  – T2_1    – C2_1    

•  Rice  – Control2h-­‐R2    Per  read  %GC  

29  

Page 30: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Data  processing  

•  Double-­‐click  "Alignment.tar.gz"  on  your  Desktop  to  unpack  it  

•  Also  available  from  h7p://bitly.com/rnaseq2014  

30  

Page 31: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Look  at  "align.sh"  

•  Open  Alignment  folder  •  Right-­‐click  "align.sh"  •  Select  "open  with  text  editor"  •  This  is  a  shell  script  –  Commands  executed  in  sequence    –  Very  useful  for  automa1ng  tasks  

•  First  line  is  "she-­‐bang"  line  –  tells  Terminal  it's  a  shell  script  

•  All  other  lines  star1ng  with  #  are  comments  (not  run)  

31  

Learning  the  bash  shell    Great  guide  to  wri1ng  shell  scripts  

Page 32: wings2014 Workshop 1 Design, sequence, align, count, visualize

align.sh  -­‐  simple  pipeline  for  RNA-­‐Seq  data  processing  

•  Aligns  a  sample  fastq  file    to  genome  – tophat2, bowtie2!–  fastq  file  is  from  Arabidopsis  cold  stress  experiment  (Short  Read  Archive  SRX348640)  

–  file  ColdTreatment-little.fastq.gz (gzip-­‐compressed,  .gz)  

•  Counts  reads  that  align  to  TAIR10  genes  – featureCounts!–  only  coun1ng  reads  that  uniquely  align  

•  Merges  alignments  into  transcript  models  – cufflinks!

32  

Page 33: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Intro  to  Terminal  

•  Double-­‐click  Terminal  shortcut  on  desktop    –  Program  for  entering  commands  or  running  scripts  – Also  called  a  "shell"  or  "Unix  shell"  –  Can  open  mul1ple  Terminal  windows    

•  Each  window  called  a  "shell"  or  "Unix  shell"  •  Terminal  shows  hierarchical  view  of  file  system  – An  upside-­‐down  tree,  where  every  folder  is  inside  another  folder  

–  Folders  are  also  called  "directories"    –  The  top  folder  (that  contains  everything  else)  is  called  "root"  directory  -­‐    /  (forward  slash)  

33  

Page 34: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Open  Terminal,  try  these  commands  

•  cd  change  directory  –  by  itself  means  "go  to  user  

home  directory"    –  with  an  argument  means:  go  there    

–  with  ".."  means  go  up  one  

•  pwd  -­‐  "print  the  current  working  directory"  &  find  out  where  you  are  

34  

Page 35: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Try  these  commands  

ls lists  files  and  directories  in  the  current  directory  

35  

Page 36: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Try  these  commands  

36  

•  ls -l  "list  long"    –  report  more  informa1on  about  files  – "d"  means  it's  a  directory  (folder)      

Page 37: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Run  align.sh  in  Terminal  

•  Go  to  home  directory  •  Go  to  Desktop    •  Go  to  Alignment    •  Run  align.sh    

37  

Page 38: wings2014 Workshop 1 Design, sequence, align, count, visualize

Now  Running:  tophat2    spliced  

alignment  tool  

38  

TopHat:  discovering  splice  junc$ons  with  RNA-­‐Seq    Cole  Trapnell1,  Lior  Pachter  and  Steven  L.  Salzberg  Figure  1  

Page 39: wings2014 Workshop 1 Design, sequence, align, count, visualize

Tophat  Output  -­‐  we'll  open  in  IGB  

•  Creates  new  folder  with  files,  including...  •  accepted_hits.bam  -­‐  "binary  alignments"  file  contains  read  alignments  –  BAM  -­‐  compressed  version  of  SAM  -­‐  "sequence  alignment",  needs  index  ".bai"  file  (made  using  samtools)  

•  junction.bed  -­‐  reports  boundaries  of  introns,  called  "junc1on"  features    –  BED  format,  tab-­‐delimited  plain  text  file  –  one  junc1on  feature  per  line  –  fi{h  field  is  score,  no.  spliced  reads  aligned  across  the  junc1on  

–  see:  h7p://genome.ucsc.edu/FAQ/FAQformat.html#format1  

39  

Page 40: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Start  IGB  while  script  runs    

•  Double-­‐click  IGB  desktop  icon  •  Click  Arabidopsis  flower  on  start  screen  

40  

Page 41: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  How  to  get  IGB  if  you're  using  your  own  computer  

•  Go  to  h7p://bioviz.org  •  Follow  Download  link  •  Choose  Medium  Memory  op1on  (typical)  

41  

Page 42: wings2014 Workshop 1 Design, sequence, align, count, visualize

TAIR10  annota1ons,  June  2009  Columbia-­‐0  genome  release  

•  TAIR10  protein-­‐coding  gene  models  loaded  automa1cally  from  IGB  data  server    

•  Forward  &  reverse  strand  in  separate  tracks  

42  

Forward  

Reverse  

Page 43: wings2014 Workshop 1 Design, sequence, align, count, visualize

RNA-­‐Seq,  ChIP-­‐Seq,  other  data  sets  available  in  Data  Access  tab  

•  IGB  data  servers,  can  set  up  your  own     43  

Page 44: wings2014 Workshop 1 Design, sequence, align, count, visualize

Arabidopsis  pollen  data  sets  

•  Read  alignments,  coverage  graphs,  junc1on  files  •  From  2013  Plant  Phys.  Pollen  RNA-­‐Seq  paper  44  

Page 45: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Combine  Plus  &  Minus  Tracks  

Click  "+/-­‐"  to  combine  tracks    

45  

Use  Data  Management  Table  to  change  track  color,  name,  visibility,  load  op1ons,  strand  op1ons  

Page 46: wings2014 Workshop 1 Design, sequence, align, count, visualize

Summary  of  moving  and  zooming  

•  Animated  zooming    –  click  to  posi1on  zoom  stripe,  sets  zoom  focus  –  horizontal  zoom  &  ver1cal  stretch  

•  Moving  from  side  to  side  (panning)  –  arrows  in  toolbar  –  hand  icon  -­‐  the  move  tool  

•  Jump-­‐zooming  –  Click-­‐drag  coordinate  axis  with  arrow  tool  – Double-­‐click  to  zoom  in  on  a  feature    –  Search  by  name  

46  

Page 47: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Zoom  in  on  a  feature  

•  Zoom  in  on  alt-­‐spliced  gene  models  *  on  chr1  •  This  is  animated  zooming  

47  

1.  Click  to  set  zoom  focus  2.  Drag  slider  

to  zoom  in    *  

Page 48: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Click  move  arrows  to  reposi1on  during  zoom  

•  Click  data  display  to  re-­‐focus  zoom  on  target  loca1on  

48  

Page 49: wings2014 Workshop 1 Design, sequence, align, count, visualize

49  

Prac1ce:  Or  use  move  tool  (hand)  to  reposi1on  during  zoom  

•  Click  display  to  focus  zoom  on  target    

1.  Select  move  tool  (hand)      

2.  Click-­‐drag  to  move  

Page 50: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Click-­‐drag  sequence  axis  to  jump-­‐zoom  to  a  region  

2.  Click  number  line  

50  

3.  Drag  

4.  Release  

•  Highlighted  region  becomes  new  view  

1.  Select  pointer  tool  

Page 51: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Jump-­‐zoom  to  gene  model  

•  Double-­‐click  label,  space  a  li7le  above  exon  blocks,  or  intron  to  jump-­‐zoom  to  a  gene  model  –  Also  selects  it,  selected  items  outlined  in  red  

51  

2.  double-­‐click  label  or  intron    

1.  Select  pointer  tool  

Page 52: wings2014 Workshop 1 Design, sequence, align, count, visualize

A{er  jump-­‐zoom,  gene  model  is  selected    

•  Arrows  indicate  direc1on  of  transcrip1on  

52  

Selected  gene  model  

outlined  in  red  

Page 53: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Gene  model  close-­‐up  

•  Use  ver1cal  slider  to  make  gene  models  taller  •  Increase  window  size  to  make  more  room  

53  

Drag  slider  to  stretch  ver1cally  

Page 54: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Interact  with  data  using  pointer.  Select  pointer  (arrow)  in  toolbar    

•  Click  intron,  label,  or  region  above  blocks  to  select  whole  gene  model  

•  Click  blocks  to  select  parts  of  a  gene  model  •  SHIFT-­‐click  to  mul1-­‐select  •  CLICK-­‐drag  to  select  &  count  everything  in  a  region  •  Selec1on  Info,  top  right,  reports  counts  –  "i"  bu7on  shows  info  if  one  item  selected    

54  

Page 55: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  View  edge  Matching  

•  Edges  that  match  selected  item  edges  are  highlighted  in  red  

•  To  change  edge-­‐match  color  choose  File  >  Preferences  >  Other  Op$ons  

•  To  turn  off  or  on,  see  View  >  Edge  Matching    

55  

Page 56: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  to  work  with  sequence  data,  click  Load  Sequence  

56  •  Sequence  appears  in  Coordinates  track  

Page 57: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Zoom  in  to  see  amino  acids  

•  Note:  Must  load  genomic  sequence  first  57  

Page 58: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Zoom  in  on  end  of  transla1on  

•  Click  the  "thick  end"  and  then  zoom  in  •  Note:  Variants  encode  same  C-­‐term  amino  acids  

58  

Page 59: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Select  genomic  sequence  1.  Choose  pointer  tool  in  toolbar      

2.  Click-­‐drag  genomic  

sequence  to  select  a  region  

3.  CNTRL-­‐click  to  copy  

•  Length  of  selected  region  reported  in  Selec$on  Info  box  (top  right)  

•  Useful  for  designing  primers,  measuring  regions  59  

Page 60: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Right-­‐click  (or  CNTRL-­‐click)  gene  model    

•  Shows  op1ons  to  run  a  Web  search,  BLAST  search,  view  sequence  

60  

Page 61: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Quick  Search  

•  Enter  search  text,  select  op1on  •  Jump-­‐zoom  to  selected  gene  

61  

Choose  At-­‐SR30  

Page 62: wings2014 Workshop 1 Design, sequence, align, count, visualize

Zoomed  to  At-­‐SR30,  RNA-­‐binding  protein  involved  in  splicing  

62  

Page 63: wings2014 Workshop 1 Design, sequence, align, count, visualize

Looking  ahead  to  Workshop  3  

•  Some  genes  that  were  highly  expressed  in  tomato  pollen  are  annotated  as  "Unknown"  proteins  &  have  no  counterpart  in  Arabidopsis.  

•  You  can  use  IGB  to  quickly  find  those  genes  and  then  run  BLASTX  or  BLASTP  searches  at  NCBI  to  find  out...  – Are  they  unique  to  tomato?  – Could  they  be  non-­‐coding?    

63  

Page 64: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Open  files  from  align.sh!

•  Zoom  out  to  show  more  of  At-­‐SR30  region  •  Choose  File  >  Open  – Select  "accepted_hits.bam"  &  "junctions.bed"    

•  A  new  empty  track  appears  for  each  file  

•  Click  Load  Data  to  load  reads  and  junc1ons  

64  

Page 65: wings2014 Workshop 1 Design, sequence, align, count, visualize

65  

read  alignments  stack    

reads  at  top  of  stack  not  being  shown  (too  

many  to  fit)  

Page 66: wings2014 Workshop 1 Design, sequence, align, count, visualize

66  

junc1on  features,  summarizing  spliced  reads  

junc1on  features,  summarizing  spliced  reads  

Page 67: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Configure  view  -­‐  Load  Sequence  

67  

Click  Load  Sequence  to  load  genomic  bases  for  this  

region    

Page 68: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Configure  view  -­‐  Lock  mRNA  track  height  

68  

1.  Click  TAIR10  mRNA  track  label  to  select  it  

2.  Open  Annota$on  tab  

3.  Select  Lock  Track  Height,  enter  170,  click  

Apply  

Page 69: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Configure  view  -­‐  configure  junc1on  track  

69  

1.  Click  junc$ons  track  label  to  select  junc1ons  track  

2.  Open  Annota$on  tab  

3.  Select  score  in  Label  

Field    

4.  Select  +/-­‐  in  Strand  

Page 70: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Configure  view  -­‐  lock  junc1on  track  height  

70  

1.  Click  junc$ons  track  label  to  

select  it  

2.  Open  Annota$on  tab  

3.  Select  Lock  Track  Height,  enter  120,  click  Apply  

Page 71: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Change  read  stack  height  to  see  more  reads  

1.   CNTRL-­‐click  (or  right-­‐click)  accepted_hits.bam  track  label  

2.  Choose  Set  Stack  Height...  71  

Page 72: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Change  read  stack  height    

3.  Enter  50    

72  

Prac1ce:  Change  read  stack  height  to  see  more  reads  

Page 73: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Set  mRNA  stack  height    

2.  Enter  3  -­‐    tallest  stack  has  3  models    

73  Note:  Tabs  are  minimized  to  make  more  space  

1.  Right-­‐click  TAIR10  mRNA  track  label,  choose  Set  Stack  Height  

Page 74: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Note  read  support  for  alterna1ve  splicing  

Take-­‐home:  Many  spliced  reads  support  both  variants,  but  there  are  also  many  reads  inside  the  introns,  indica1ng  failure  to  splice.  This  may  be  typical  of  alt-­‐spliced  introns?  

74  

Page 75: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Use  junc1on  track  to  quan1fy  support  for  splice  variants  

1.  Click-­‐drag  to  genes  track  2.  Scores  are  number  of  

spliced  reads  suppor1ng  each  junc1on.  

75  

Page 76: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Compare  Cufflinks  GTF  file  to  Gene  models    

•  Open  Alignments  >  cufflinks_cold  >  transcripts.gf  

76  

Page 77: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  View  Cufflinks  gene  models  

77  

1.  Click  Load  Data  to  see  Cufflinks  models  

2.  Click-­‐drag  new  track  next  to  gene  models  

3.  Use  ver$cal  slider  to  make  more  room  

Take-­‐home:  Cufflinks  annota1ons  close,  but  incomplete.      

Page 78: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Load  data  from  Galaxy  

78  

1.  Go  to  usegalaxy.org  2.  Open  Shared  Data  

3.  Choose  Published  Histories  

Page 79: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Load  data  from  Galaxy  

79  

1.  Search  for  Cold  

3.  Select  Cold  stress  in  Arabidopsis  (with  default  maximum  intron  size)    

Page 80: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Load  data  from  Galaxy  

•  Illustrates  results  when  tophat  is  run  with  default  seongs:  –  default  maximum  intron  size  is  500,000  bases  

•  Tophat  was  developed  with  human  data  in  mind,  where  large  introns  are  common  

80  

Select  Import  History    

Page 81: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Select  start  using  this  history  

81  

Page 82: wings2014 Workshop 1 Design, sequence, align, count, visualize

82  

1.  Select  Treatment  junc1ons      

2.  Select  display  in  IGB  View    

Page 83: wings2014 Workshop 1 Design, sequence, align, count, visualize

83  

New  tab  opens.  Select  Click  to  go  to  IGB    

Page 84: wings2014 Workshop 1 Design, sequence, align, count, visualize

84  

New  track  1.  Click  Load  Data  

Page 85: wings2014 Workshop 1 Design, sequence, align, count, visualize

Prac1ce:  Remove  reads  -­‐  don't  need  them  now  

85  

1.  Right-­‐click  accepted_hits.bam  

2.  Choose  Delete  Track  

Page 86: wings2014 Workshop 1 Design, sequence, align, count, visualize

86  

1.  Zoom  out  all  the  way  

2.  Click  Load  Data  

Your  data  are  here  

Page 87: wings2014 Workshop 1 Design, sequence, align, count, visualize

87  

Take-­‐home:  Tophat  run  with  default  parameters  predicts  enormous  introns.  Important  to  understand  parameters  seongs  -­‐-­‐  defaults  are  not  always  best.  

Page 88: wings2014 Workshop 1 Design, sequence, align, count, visualize

Now  you  can  

•  Describe  Illumina  library  synthesis,  sequencing  •  Evaluate  data  quality  using  FastQC  •  Run  a  data  processing  pipeline  (shell  script)  •  View  and  explore  data  in  a  genome  browser  – and  load  data  sets  from  Galaxy,  local  files  

88  

Thank  you  for  your  a7en1on!