morgane(thomas-chollier(mthomas/2014-2015/m2/cis-reg/... · (negave))controls)...

17
(Nega&ve) controls Morgane ThomasChollier Computa)onal systems biology IBENS [email protected] Denis Thieffry, Jacques van Helden and Carl Herrmann kindly shared some of their slides. M2 – Computa6onal analysis of cisregulatory sequences 2014/2015

Upload: others

Post on 25-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

(Nega&ve)  controls  

Morgane  Thomas-­‐Chollier    

Computa)onal  systems  biology  -­‐  IBENS  [email protected]    

Denis  Thieffry,  Jacques  van  Helden  and  Carl  Herrmann  kindly  shared  some  of  their  slides.    

M2  –  Computa6onal  analysis  of  cis-­‐regulatory  sequences  2014/2015  

Page 2: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Aim  of  the  course  

1  –  Understand  the  need  for  controls  in  bioinforma6cs        2  –  Some  strategies  to  build  controls  

Page 3: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Controls  in  biology  

Wellik  and  Mario  R  Capecchi,  Science,  2003  

Page 4: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Evaluate  predic6ons  with  controls  

•  Quan&fy  the  capability  of  the  program  to    »  detect  known  features    »  =  Return  a  posi&ve  answer  for  a  posi&ve  feature  

»  Not  detect  false  features  »  =  Return  a  nega&ve  answer  for  a  nega&ve  feature  

Posi6ve   Nega6ve  

Posi6ve   True  Posi6ve   False  nega&ve  

Nega6ve   False  Posi&ve   True  Nega6ve  

Predic)ons  

Anno

ta)o

n  

Page 5: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

In  the  context  of  cis-­‐regula6on  

5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT!

5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG!5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT!

5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC!

5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA!

5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA!5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA!

…HIS7 !…ARO4!…ILV6!…THR4!…ARO1!…HOM2!…PRO3!

Use  different  set  of  sequences  

Use  different  set  of  matrices  

Page 6: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Sequences  

•  Posi6ve  control:  quan&fy  the  capability  of  the  program  to  detect  known  regulatory  elements  »  Annotated  sites  (e.g.  sites  from  TRANSFAC)  in  their  original  context  (the  

promoter  sequences).  »  Annotated  sites  implanted  in  other  context  

-  Biological  sequences  (random  selec&on).  -  Ar&ficial  sequences.  

»  Ar&ficial  sites  implanted  in  ar&ficial  sequences.  

•  Nega6ve  control:  quan&fy  the  capability  of  the  program  to  return  a  nega&ve  answer  when  there  are  no  regulatory  elements.  »  Ar&ficial  sequences    

(generated  according  to  a  Bernoulli  or  a  Markov  model  to  mimic  an  organism  of  interest  )  

»  Biological  sequences  without  common  regula&on    (random  selec&on  of  genes)  

Page 7: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Ar6ficial  sequences  

•  Random-­‐seq  in  RSAT  »  Generate  ar&ficial  sequences  (mimicking  real  biological  sequences)  »  Re-­‐run  the  exact  same  analysis  

Page 8: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Randomized  (shuffling)  sequences  

•  Randomized  sequences  »  Maintain  composi&on  (=nb  of  A,C,G,T)  »  Conserva&on  of  higher-­‐order  dependencies  ?  »  Is  it  likely  that  the  signal  is  s&ll  there  ?  

Page 9: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Sequences  

•  Posi6ve  control:  quan&fy  the  capability  of  the  program  to  detect  known  regulatory  elements  »  Annotated  sites  (e.g.  sites  from  TRANSFAC)  in  their  original  context  (the  

promoter  sequences).  »  Annotated  sites  implanted  in  other  context  

-  Biological  sequences  (random  selec&on).  -  Ar&ficial  sequences.  

»  Ar&ficial  sites  implanted  in  ar&ficial  sequences.  

•  Nega6ve  control:  quan&fy  the  capability  of  the  program  to  return  a  nega&ve  answer  when  there  are  no  regulatory  elements.  »  Ar&ficial  sequences    

(generated  according  to  a  Bernoulli  or  a  Markov  model)  »  Biological  sequences  without  common  regula&on    

(random  selec&on  of  genes)  

Page 10: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Biological  sequences  

•  Random-­‐genes  in  RSAT  »  Select  X  genes  randomly  within  a  given  genomes  »  Obtain  the  upstream  sequences  »  Re-­‐run  the  exact  same  analysis  

Page 11: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Biological  sequences  

•  Genes  not  differen6ally  regulated  »  Select  X  genes  among  genes  that  do  not  show  changes  in  expression  »  Obtain  the  upstream  sequences  »  Re-­‐run  the  exact  same  analysis  

Page 12: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Biological  sequences  

•  Genes  not  differen6ally  regulated  »  Coverage  in  reads  in  windows  aroung  TSS  (histone  marks)  

●●

●●●

●●

●●●

●●

up−regulated genes 10x randomly−picked not regulated genes

0.0

0.2

0.4

0.6

0.8

1.0

H3K27ac within +/−20kb window around 30 genes

H3K

27ac

/ [g

ene

+/−2

0kb

win

dow

]

Wilcoxon test pvalue= 0.0016

Page 13: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Biological  sequences  

•  Random  genome  fragments  in  RSAT  »  Select  a  set  of  fragments  with  random  posi&ons  in  a  given  genome,  

and  return  their  coordinates  and/or  sequences  

»  Adapted  to  chip-­‐seq  ?  -  Yes:  same  number  of  peaks  +  same  size  -  No:  composi&on  of  the  sequences  (dinucleo&des)  not  respected  

Page 14: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

In  the  context  of  cis-­‐regula6on  

5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT!

5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG!5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT!

5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC!

5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA!

5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA!5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA!

…HIS7 !…ARO4!…ILV6!…THR4!…ARO1!…HOM2!…PRO3!

Use  different  set  of  sequences  

Use  different  set  of  matrices  

Page 15: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Matrix  permuta6ons  

TrpR  

TrpR  permuta&ons  

...  

•  Matrix-­‐quality  in  RSAT  »  Compare  distribu&ons  of  scores  for  PSSMs  

Page 16: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Matrix  quality  with  nega6ve  datasets  

•  Matrix-­‐quality  in  RSAT  »  Not  for  randomly-­‐generated  sequences  (random-­‐seq)  as  it  will  

ALWAYS  follow  the  theore&cal  curve  (=  background  =  markov  model  used  to  generate  the  sequences  !)  

»  OK  for  random  selec&on  of  genes  

!"##$%&%'()*+,-./"*%,0,

Page 17: Morgane(Thomas-Chollier(mthomas/2014-2015/M2/cis-reg/... · (Negave))controls) Morgane(Thomas-Chollier(! Computa)onal!systems!biology!2!IBENS) mthomas@biologie.ens.fr)) Denis)Thieffry,)Jacques)van)Helden)and)Carl)Herrmann

Building  controls  in  RSAT