11 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Scientific Publishing in the Future
Mark B Gerstein
Yale
with
D GreenbaumM SeringhausR AuerbachK CheungP BourneS Fields
Slides at Lectures.GersteinLab.org (See Last Slide for References & More Info.)
22 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
The Situation: a Changing
Landscape between "Info. Resources" &Traditional Journals
• Explosion of Web Information Resources
• DBs & Journals- Ends of blurring spectrum
• Reading articles from "queries" & "Mining" journal text
• Reading DB entries • Biology as science of
heterogeneous facts- What's vehicle for
annotating each base of genome ?
2
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
[Nat. Biotech. 21:979; Bioinformatics 15:429]
EMBLNAR DB Issue
33 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
An Aspect of the Situation: Hard to fit everything into conventional
journal articles or biological DBs
• Many other aspects of a "citatable paper" than just narrative text- for computational biology: code downloads, annotation sets- for expt. biology: reagents, apparatus, &c.... - associated lectures and easy-to-read summaries, videos
(scivee) • Readability of Narrative Decreasing
- Data embedded into papers, making text hard to read• Data aliquot for DBs
- DB handle huge homogenous datasets but what of isolated facts
[BMC Bioinformatics 8:17 ; PLoS Comp. Bio. 4: e1000037]
44 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Potential Directions Forward:Between Extremes
• Stay with Traditional Paper- "3 author" narrative on a conceptual advance- "Hard currency" of scientific discovery & advancement
• Move to Blogs, Websites, &c.- Posts, freeware distributions, DB depositions &c- Reference "Marker Papers", just shells
• Hypothetical "1 Million Genomes" consortium "marker", associated with many pieces of software, datasets, &c which not fully explained in article
• Take Middle Way- Structured and multi-tiered scientific literature
more compatible with the digital world
55 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Structured, Multi-tiered Digital Paper Proposal
• "Broadened," Multi-tiered papers- Narrative text +
easy-to-read lay version + structured, mach. readable vers.
- Extended Suppl. material, containing data & code
- associated lectures, videos
• "Structured" Tier- for automatic deposition into DBs
& textmining - Cross-referencing with a specific
part of the genome, proteome, &c- Written as annotation from start
• All tiers submitted in parallel- Refereed & edited normally
• Incentives- Capitalizes on peer review &
incentives to publish- Authors (not currators) in control
of process- But officiated by referees and
editors
• A Path Forward- Abstract, Tables, Figures,
Equations..... eventually all
[Cheung et al. MSB ('10, in press); FEBS Lett 582: 1170; Nature 447: 142]
66 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Ex. Structured Abstract
• K.lactis (species)- KlSTE4 (gene)
• KlSte4p (protein)– CLONED
* Available at …– SEQUENCED
* Sequence ATGTACGCTATAGGC….– MUTANTS
* DELETION* FUNCTIONAL ASSAYS* Sterile in both MATa and MATα* No defect in vegetative growth* STRAIN INFORMATION* Available at….
– INTERACTIONS* TWO-HYBRID* KlGpa1p (10x stronger) = XXX* Control (no partner) = XXX* KlGpa1p* = XXX* KlGpa2p = XXX* ScGpa1p = XXX (S. cerevisiae)¯¯¯¯
- KlGPA1 (gene)• KlGpa1p (protein)
– INTERACTIONS* TWO-HYBRID* KlSte4 = XXX
• KlGpa1p* (protein)– INTERACTIONS
* TWO-HYBRID* KlSte4 = XXX
- KlGPA2 (gene)• KlGpa2p (protein)
– INTERACTIONS* TWO-HYBRID* KlSte4 = XXX
• S.cerevisiae (species)- SCGPA1 (gene)
• ScGpa1p (protein)– INTERACTIONS
* TWO-HYBRID* KlSte4 = XXX
[BM
C B
ioin
form
atic
s 8
:17
]
77 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Structured Digital Table
• Canonical Table Types
• Using standardized journal tables as small "stubbs" for larger datasets
[Cheung et al., MSB ('10) in press]
88 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
99 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
Default Theme
• Default Outline Level 1- Level 2
1010 -
Lec
ture
s.G
erst
ein
Lab
.org
(c)
'09
More Information on this Talk
SUBJECT: Textmining
DESCRIPTION:
PLOS Workshop at ISMB 2010: Where & How to Get Publishhere & How to Get Published, 2010.07.13, 10:45-12:45; [I:ISMB10] (5' total)
NOTES:This PPT should work on mac & PC.