big data, small data, data papers - short statement for "bdebate on biomedicine 2014"

Post on 02-Jul-2015

440 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

My short statement on the (close) debate on Big Data: http://www.bdebate.org/en/forum/big-data-biomedicine-challenges-and-opportunities

TRANSCRIPT

!

What is Big Data in Biomedicine?!Data Types to be considered!

!

Susanna-Assunta Sansone, PhD!

!

@biosharing!@isatools!

@scientificdata!!

B-DEBATE: Big Data in Biomedicine. Challenges and Opportunities, 11 Nov, 2014

Data Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

•  Big science efforts represent only a small proportion!o  often featuring homogenous and well-organized data!!

•  There is a large proportion of small independent research efforts!o  a rich variety of specialty data sets!

Let’s not forget the long tail of research data

•  Small independent research efforts fall in the long-tail of the distribution!o  Most of this (such as as siloed databases, null findings) is

unpublished!o  These dark data hold a potential wealth of knowledge!

Let’s not forget the long tail of research data

•  Over 50% of completed studies in biomedicine do not appear in the published literature!

!

•  Instead reside in file drawers personal and hard drives!

!

•  Often because results do not conform to author's hypotheses!

“Only half the health-related studies funded by the European Union between 1998 and 2006 - an expenditure of €6 billion - led to identifiable reports”!

Plagued by selective reporting of data and methods

Role of data papers and data journals

•  Incentive, credit for sharing!o  Big and small data!o  Unpublished data!o  Long tail of data!o  Curated aggregation !

•  Peer review focus!•  Value of data vs. analysis!•  Discoverability and reusability!

o  Complementing community databases!

•  Narrative/context!

•  The power of “small data” are in their aggregation and integration with other datasets!

•  There is value in all well-curated, validated and reusable data – big and small!

Role of data papers and data journals

Res

earc

h ar

ticle

s D

ata

reco

rds

Dat

a D

escr

ipto

rs

Adding value to research articles and data records

Res

earc

h ar

ticle

s D

ata

reco

rds

Dat

a D

escr

ipto

rs

Adding value to research articles and data records

Credit for sharing your data

Focused on reuse and reproducibility

Peer reviewed, curated

Promoting community data and code repositories

Open Access

~ 156

~ 70

~ 334

Source: BioPortal

Databases !implementing !

standards!

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Progressively refine guidance to authors and reviewers

Mapping the landscape of standards and databases

Mapping the landscape of standards and databases

Researchers, developers and curators lack support and guidance on how to best navigate and select content standards, understand their maturity, or find databases that implement them;

Funders, journals and librarians do not have enough information to make informed decisions on which content standards or database to recommended in policies, or funded or implemented

Help stakeholders to make informed decisions

Summarizing

•  Selective reporting of data and methods is still an issue

•  Let’s not forget the potential value of the long-tail of data

•  Data papers and journals can provide incentive and credit to share more data - big and small

•  Content standards do help - but the current wealth of options is an obstacle

top related