modencode august 20-21, 2007 drosophila transcriptome: aim 2.2

16
modENCODE August 20-21, 2007 osophila Transcriptome: Aim 2.

Upload: barbra-berry

Post on 02-Jan-2016

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

modENCODEAugust 20-21, 2007

Drosophila Transcriptome: Aim 2.2

Page 2: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

Aim 2.2 Experimental Validationof Transcript Models

1. Experimental verification of selected splice sites in transcript models (short RT-PCR)

2. Mapping transcript ends using RACE

3. Screening cDNA libraries for transcripts

4. Recovering cDNA clones using long RT-PCR

5. High-throughput sequencing of small RNAs

6. Submitting sequence data to databases

7. Reviewing the transcriptome annotation

Page 3: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

Experiments at LBNL

Transcript EndsTSSs: 20,000 targeted 5’ RACE experiments poly-A: 1,000 targeted 3’ RACE experiments

Full-Length Transcript Structures6,000 cDNA screens and full-insert sequencing3,000 long RT-PCRs and full-insert sequencing

Small RNA Sequencing15 runs on on 454 Life Sciences deviceSize fractionate < 500 nt (larger range than Eric Lai)

Page 4: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

Mapping TSSs

• 5’ RLM-RACE is a simple, scalable method

• RLM primer replaces the 5’ CAP structure

• Gene specific primers are nested & near 5’ end

• Sequence 8 clones• Direct sequencing is also

proposed but is difficult• We are prioritizing

transcripts and tissues using our 5’ EST data

Page 5: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

TSSs: Slippery vs Discrete

head RACE productslarval RACE products

cDNAs

Page 6: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

Cap-Trapped 5’ ESTs Define Discrete…

Page 7: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

…and Slippery Transcripotion Start Sites

Page 8: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

How Many TSSs Does bowl Have?

Page 9: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

5’ RACE Plans

• Identify TSSs that are well mapped by 5’ EST data• Test RLM-RACE production protocol on 96 well

mapped TSSs to measure experimental success rate• Prioritize 5’ RACE experiments:

1. Transcripts with < 8 RE ESTs, using mixed embryo RNA2. Transcripts with ESTs from other embryo-derived libraries3. Transcripts with < 8 RH/TA ESTs4. Transcripts with larval/pupal ESTs5. Transcript without ESTs. Use appropriate RNA samples.

• Develop statistical description of “slipperiness”• Biological validation with microarrays & P elements

Page 10: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

Computationally predicted conserved exons validated by cDNA screening and sequencing

I. Gene modifications II. Identification of New Genes

Page 11: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

cDNA and Long RT-PCR Plans

• Identify all transcripts that are well defined by cDNA sequence- complete & spliced ORF, poly-A tail, (not necessarily a defined TSS)

• Identify targets for cDNA screening (DGC goals in parentheses)(Transcripts with a community cDNA but no BDGP cDNA)(Transcripts with truncated ORFs)(Alternative transcripts that encode alternative coding sequences)1. Conserved ORFs that failed on the first SLIP attempt: choose best RNA2. Transfrags & RACEfrags that are not captured in sequenced transcripts

• Identify targets for long RT-PCR- targets that fail in SLIP screening on the best RNA sample- RT-PCR is probably more sensitive than SLIP but seems limited to ~2 kb

• cDNA and RT-PCR design depends on Aim 1 & Aim 2.1 and should be an iterative process.

• Biological validation using integrated description of all data

Page 12: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

An Unannotated Transfrag

Page 13: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

A Relatively Rare Transript

CG31036: chordotonal neurons,lateral and head sensory neurons

Page 14: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

High Throughput Sequencing Plan

• Pyrosequence RNA samples on 454 Life Sciences device- consider alternative platforms, e.g. Solexa

• Select 15 target tissues for analysis• Define a transcript size range to target

- avoid redundancy with Eric Lai: < 50 bases vs 50-500 bases- consider avoiding tRNAs

• Align transcript sequences and integrate with models• Biological validation:

Compare to microarray dataConservation in other species, including structure for ncRNAsFunctional genomics in Aim 3

Page 15: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

Some Questions for Discussion

• How many genes & transcripts in Drosophila?

• How many genes with multiple transcripts? CDSs?

• Are these expressed in different cell types?

• Can we segregate them in different RNA samples to avoid mixed RACE, cDNA and RT-PCR products?

• How do we prioritize screening

• What will we miss?

• How do we know when we’re done?

Page 16: ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

Future Directions

• Do different promoter motifs correlate with “slipperiness”, tissue, stage?

• Confidence scores associated with exons, transcripts and gene models:How do we measure confidence?How confident can we be?How much data do we need per gene?