an overview of the regcreative jamboree -...
TRANSCRIPT
A curated database of DNase I footprints in D. melanogaster
Galas and Schmitz (1978)
• “Current” compilations of transcription factor binding sites (e.g. Transfac) are incomplete and not linked to genome
• DNase I footprints are a high quality, abundant source of binding site data
• Can be used for genome annotation, PWM construction, motif inference, cis-regulatory prediction, comparative genomics/molecular evolution, systems biology, text mining & ...
• Flybase 4.1 currently has only 90 binding sites & 27 enhancers annotated
FlyReg: Materials, Methods & Results
200+ articles, 800+ authors, 50+ pers. comm., 20+ years100+ regions, 1350+ TFBSs, 85+ TFs
Base Position
Chromosome Band
Conservation
d_yakuba
d_pseudoobscura
a_gambiae
5034500 5035000 5035500 5036000 5036500 5037000 5037500 5038000 5038500 5039000 5039500 5040000 5040500 5041000Chromosome Bands
Protein-Coding Genes from FlyBase
Non-Coding Genes from FlyBaseFlyReg: Drosophila DNase I Footprint Database
D.mel./D.yakuba/D.pseudoob./A.gambiae Multiz Alignments & phastCons Scores
46C10
eve
eveUnspecified
evettk
UnspecifiedUnspecified
knihbhbknihbknihb
hbkni
hbhb
kni
hbhb
hbhb
KrKrKrbcd
Krgt
bcdgt
KrKr
Krbcd
KrKr
bcd
Krgt
hbKr
bcd
Kr
hbKr
hb
UnspecifiedUnspecifiedUnspecified
ttkUnspecified
ttkUnspecified
prdeve
UnspecifiedUnspecified
eveprd
UnspecifiedUnspecifiedUnspecifiedUnspecified
Unspecified
FlyReg database of Drosophila DNAse I footprints
Data imported by UCSC, FlyBase, FlyMine, Ensembl, Transfac, FlyTF, REDfly & ORegAnno
shn
Abd-A
fkh
ko
Dll
dpp
mus209
tsh
bcd
salm
Antp
dl
Ubx
zen
kni
ftz
eve
hb
tll
Kr
Trl
grh
cad
h
en
gt
ttk
cis-regulatory annotation & systems biology
A partial timeline of events leading up to the RegCreative Jamboree
mid 2004 - E. Birney starts the “cis-regulation” mailing list
late 2004 - FlyReg database released
early 2005 - Proposal for a mammalian cis-reg database
mid 2005 - Informal meeting at EBI to discuss Ensembl regulatory schema & curation tools
late 2005 - One-day workshop at EBI to discuss Ensembl regulatory schema, curation tools & virtual jamboree
late 2005 - ORegAnno & PAZAR released
A partial timeline of events leading up to the RegCreative Jamboree
early 2006 - Proposal for a cis-regulatory BioCreative text-mining challenge
early 2006 - Discussion about using annotation jamboree to create training datasets for BioCreative challenge
mid 2006 - Funding from ENFIN, Biosapiens, FWO, Genome Canada
late 2006 - RegCreative Jamboree !!
mid 2006 - Further development of Oreganno (e.g. queue)
Some goals of the RegCreative Jamboree
Improve standards & infrastructure for regulatory curation
Evaluate inter-annotator consistency
Identify opportunities for text-mining assisted regulatory curation
Clarify specific aims for regulatory text-mining challenge
Develop criteria for text-mining challenge data sets
Increase amount of annotated regulatory sequence data
A) Recover text that proves a known TF-target gene interaction:We will provide TF and target gene name pairs, a TF-target gene interaction and the associated publication. Participants will have to provide a part(s) of the document that would (to a human expert) prove the original annotation.
B) Identify evidence supporting a TF-target gene interaction using an evidence code ontology:We will provide TF and target gene name pairs, a TF-target gene interaction and the associated publication. Participants will have to provide the type of experimental evidence that would support the original annotation.
C) Identify TF-target gene interaction(s) from known gene names:We will provide TF and target gene names and the associated publication with an interaction for this gene pair. Participants will have to 'annotate' automatically the TF-target gene according to the information in this paper and provide a part(s) of the document to prove the original annotation.
D) Selection of relevant papers from a list of known TF gene names:We will provide a list of transcription factor names and a (probably high) number of papers of which most are irrelevant for the protein. The participants will have to detect which papers are relevant for a transcription factor in the sense that they contain information about TF-target gene interactions.
A draft BioCreative regulatory challenge agenda