promoter analysis tfbs detection daniel rico, phd. drico@cnio.es daniel rico, phd. drico@cnio.es

Post on 15-Jan-2016

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Promoter AnalysisTFBS Detection

Daniel Rico, PhD.

drico@cnio.es

Daniel Rico, PhD.

drico@cnio.es

1. Promoters and gene regulation in Eukaryotes

2. Position Weight Matrices (PWM)

3. PWM Databases

4. TFBS prediction using PWMs

5. Pattern Discovery: Finding unknown motifs

6. Exercise: Use the human NOS2 sequence

to predict TFBS with Match and JASPAR

2

Transcription Factor Binding Sites

1. Promoters and gene regulation in Eukaryotes

2. Position Weight Matrices (PWM)

3. PWM Databases

4. TFBS prediction using PWMs

5. Exercise: Use the human NOS2 sequence

to predict TFBS with Match and JASPAR

3

4

Gene

Enhancer

TSS: Transcription Start Site

“Proximal” promoter(100bp-2Kb 5’ Upstream)

Promoters

Promoters are DNA segments upstream of transcripts that initiate transcription

Promoter attracts RNA Polymerase to the transcription start site

5’Promoter 3’

5

GENES IN ENSEMBL

6

5’ Forward (+) strand 3’

Reverse (-) strand

7

Transcription Termination Site

Transcription Start Site

Promoter Structure in Prokaryotes (E.Coli)

Transcription starts at offset 0.

• Pribnow Box (-10)

• Gilbert Box (-30)

• Ribosomal Binding Site (+10)

8

9

Promoter Structure in Eukaryotes

10

CAGE (Cap Analysis of Gene Expression))detects the transcriptional activity of each promoter transcript.

Experimental Transcription Start Sites (TSS)by CAGE

11

Representation of CAGE preparation protocol adapted to various platforms.

Now Solexa and Illumina are preferred. 454 Life Sciences (FLX system) is not used any longer because concatenation requires additional PCR cycles and complicated manipulation.

In the future, single-molecule sequencing technology will be preferred because PCR may not be required.

12

http://www.osc.riken.jp/english/activity/cage/basic/

13http://fantom.gsc.riken.jp/4/edgeexpress/view/

http://www.epd.isb-sib.ch/ 14

15

Sequence Analysis: Searching Transcription Factor Binding Sites (TFBS)

16

TFBS: Detection methods

in vivoFunctional analysisChIP

in vitro on cloned fragmentFootprinting reactionsExonuclease digestsGel retardation (EMSA)UV Crosslinking

in vitro on artificial DNA:SELEX: Systematic Evolution of Ligands by Exponential

enrichment

17

18

Affinity

Specificity

Nat Rev Genet. 2010 Nov;11(11):751-60. Epub 2010 Sep 28.Determining the specificity of protein-DNA interactions.

Transcription Factors bind TO TFBS in DNA

19

TF Binding Sites

Problems:often poorly defined consensusSequences not conserved within species, and

even worse between speciesExamples of enhancers functionally conserved

but not sequence-conservedMost of the TFBS sequence data comes from

just a few speciesVery often in vitro experiments2 completely different binding sites could be

merged in the same matrix/consensus

19

Transcription Factor Binding Sites

1. Promoters and gene regulation in Eukaryotes

2. Position Weight Matrices (PWM)

3. PWM Databases

4. TFBS prediction using PWMs

5. Pattern Discovery: Finding unknown motifs

6. Exercise: Use the human NOS2 sequence

to predict TFBS with Match and JASPAR

20

Data collection

Probabilities can be calculated and corrected for background

Also called position-specific scoring matrices (PSSMs). In log scale.21

From PFM to PWM/PSSM

Transcription Factor Binding Sites 22

SEQUENCE LOGOS: The information content of a matrix column ranges from 0 (no base preference) and 2 (only 1 base used).

http://weblogo.berkeley.edu/ http://www.lecb.ncifcrf.gov/~toms/sequencelogo.html23

AAGTTCAAGCTCAGGCTCAAGGTC

A 430000 C 000204G 014100T 000140

Consensus: ARGBTC

Summary

24

Transcription Factor Binding Sites

1. Promoters and gene regulation in Eukaryotes

2. Position Weight Matrices (PWM)

3. PWM Databases

4. TFBS prediction using PWMs

5. Pattern Discovery: Finding unknown motifs

6. Exercise: Obtain mouse and human fosB promoters

and predict TFBS with Match and JASPAR

25

26

Transfac: not free, 848 matrices, loads of information and references, quality score based on methods used

Jaspar: open sources, 123 matrices, minimal information, majority based on SELEX method (80%)

26

TRANSFAC®

27http://www.gene-regulation.com/pub/databases.html

http://jaspar.cgb.ki.se/

http://jaspar.genereg.net/

28

29

Jaspar example: Pax6

29

Transcription Factor Binding Sites

Transcription Factor Binding Sites

1. Promoters and gene regulation in Eukaryotes

2. Position Weight Matrices (PWM)

3. PWM Databases

4. Pattern Matching: TFBS prediction using PWMs

5. Pattern Discovery: Finding unknown motifs

6. Exercise: Use the human NOS2 sequence

to predict TFBS with Match and JASPAR

30

Click here to select all TFBSClick here to

select all TFBS

31

Transcription Factor Binding Sites

1. Promoters and gene regulation in Eukaryotes

2. Position Weight Matrices (PWM)

3. PWM Databases

4. Pattern Matching: TFBS prediction using PWMs

5. Pattern Discovery: Finding unknown motifs

6. Exercise: Use the human NOS2 sequence

to predict TFBS with Match and JASPAR

32

33

Pattern discovery

Reference Genome

Seq. oligo expectedfrequency

AAAAAA 0.00024AAAAAC 0.00030AAAAAG 0.00031AAAAAT0.00024AAAACC 0.00028…

Sequences of interest

Seq. oligo observedfrequency

AAAAAA 0.00023AAAAAC 0.00031AAAAAG 0.00125AAAAAT0.00018AAAACC 0.00026…

***

33

http://meme.sdsc.edu/meme/ 34

Transcription Factor Binding Sites

1. Promoters and gene regulation in Eukaryotes

2. Position Weight Matrices (PWM)

3. PWM Databases

4. Pattern Matching: TFBS prediction using PWMs

5. Pattern Discovery: Finding unknown motifs

6. Exercise: Use the human NOS2 sequence

to predict TFBS with Match and JASPAR

35

EXERCISE Step by step

a. Download from UCSC or Ensembl the human NOS2 gene plus 5000 bases upstream. Select the “proximal promoter” first 1Kb: from -1000 to TSS (hint: there is no zero position!)

b. Go to JASPAR and search for TFBS in promoter with the defaults.

c. Do the same exercise with the mouse NOS2.

d. Compare the results.

36

Chromatin AccessibilityAccess to experimental information37

http://www.nature.com/scitable/

Eucromatina y Heterocromatina

Replicatión tardía (late)Replicatión temprana (early)

Nat Rev Genet. 2011 Jul 12;12(8):554-64. doi: 10.1038/nrg3017.Determinants and dynamics of genome accessibility.

ENCODE: www.genome.gov/10005107

ENCyclopedia of DNA Elements, NHGRI Consortium of international researchers UCSC is the Data Coordination Center 47

Slides from http://www.openhelix.com/ENCODE

ENCODE Background

Pilot phase, or phase I: www.genome.gov/26525202 Selected regions of the genome: 1%, 30 MB 48

ENCODE Pilot Data and Beyond

ENCODE portal: http://genome.ucsc.edu/ENCODE/ Pilot ENCODE browser: genome.ucsc.edu/ENCODE/pilot.html49

ENCODE Next Phase: Production Phase

UCSC is the DCC for human and mouse data The portal is available: genome.ucsc.edu/ENCODE/ New aspects of the Production Phase projects 50

ENCODE Production Phase Focus

ENCODE is now genome-wide Specific cell types and new technologies being applied Project focus topics selected, then supplemented

Copyright O

penHelix. N

o use or reproduction w

ithout express written

consent

51

chromatin

transcriptome/genes

promoters/regulatory

sites

DNase sites

ENCODE Data is Flowing!

Data being submitted to UCSC DCC by data providers “Wranglers” ensure meta data is present Quality checks occur, data is released for use

Copyright O

penHelix. N

o use or reproduction w

ithout express written

consent

52

ENCODE Data Types Mapping data

Genes

Expression

Regulation

Variation

53

ENCODE Tracks

identified with icon

Regulation Data

Regulation data Structure: modifications, open vs. closed chromatin 54

Image from NIH

Regulation Data II

Transcription factor binding sites, TFBS RNA binding proteins 55

TATA bound to DNA

top related