hosting a compound centric community resource for chemistry data

Post on 10-May-2015

1.443 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Laboratories around the world continue to generate immense amounts of data that are non-proprietary and of value to the community. If available these data could dramatically reduce costs by minimizing rework and ultimately facilitating faster research. High quality reference data collections of chemical compound dictionaries, properties and spectra have been generated over many decades. With the advent of social networking tools and platforms such as Wikipedia, the community has an opportunity to contribute. The ChemSpider platform hosted by the Royal Society of Chemistry is a compound centric database with associated data. Already populated with almost 25 million unique compounds the community can deposit and host their own data, and curate and annotate existing data including those generated in Open Notebook Science Efforts. This presentation will provide an overview of progress to date and outline the vision of this community platform for chemistry and ensuring the longevity of chemistry reference data.

TRANSCRIPT

Hosting a Compound Centric Community Resource for Chemistry Data

Antony Williams, ACS Anaheim March 28th 2011

Data Archiving, e-Science andPrimary Data How much data generated in a lab, that COULD

go public, is lost forever? Public Domain reference databases of value?

Syntheses Properties Spectra CIFs Images

Much of chemistry is chemical structure-based – where and how could we host these data?

The Social Network

Career-wise, within the next few years NOT having a personal presence online will be a detriment Self-marketing Establishing a profile Getting on the record Collaborative Science Demonstrating a skill set Measured using alternative metrics Contributing to the public peer review process

Social Networking Tools

A growing number of social networking tools:

Facebook Twitter Linked-In Flickr YouTube Blogs Communities Collaborative environments

Collaborative Knowledge Management

TotallySynthetic.com

Contributing Chemistry online Property databases Compound aggregators Screening assay results Scientific publications Encyclopedic articles (Wikipedia) Metabolic pathway databases ADME/Tox data – eTOX for example Blogs/Wikis and Open Notebook Science Contributing Open Source code to projects

Chemistry Social Networking Methods of sharing MY chemistry online include:

Wikis or blogs Slideshare for presentations YouTube for videos Flickr, Wikimedia etc. for images (and FigShare) PubChem for assay data NMRShiftDB for NMR assignments GoogleDocs for data (and FigShare)

FigShare

FigShare

Chemistry Social Networking Methods of sharing MY chemistry online include:

Wikis or blogs Slideshare for presentations YouTube for videos Flickr, Wikimedia etc. for images (and FigShare) PubChem for assay data NMRShiftDB for NMR assignments GoogleDocs for data (and FigShare)

What other online environments can you immediately share chemistry data?

ChemSpider

ChemSpider is a chemistry database >25 million compounds, >400 data sources A deposition platform

Structure(s) Identifiers Links to internet resources, articles and DOIs Experimental data (spectra, images, CIFS) Multimedia (videos, MP3s)

A curation and annotation platform Remove “bad data” Annotate existing data

A publishing platform for the community

Search for a Chemical by name

Available Information…

Linked to vendors, safety data, toxicity, metabolism

Available Information….

Crowdsourced “Annotations”

Users can add Descriptions/Syntheses/Commentaries Links to PubMed articles Links to articles via DOIs Add spectral data Add Crystallographic Information Files Add photos Add MP3 files Add Videos

Spectra

Spectra

Inherited Errors

Inherited errors from every database… all public compound databases, including ours, have errors

“Incorrect” structures – assertions, timelines etc

“Incorrect” names associated with structures

ENORMOUS CHALLENGE

Crowdsourced Curation

Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Search “Vitamin H”

“Curate” Identifiers

“Curate” Identifiers

“Curate” Identifiers

Crowdsourcing Works

>130 people have deposited data and participated in data curation

Different level curators check each other

More curators and depositors are encouraged!

Molbank (Open Access Journal)

ChemSpider SyntheticPages

Many syntheses are not published but are of value

CSSP: A database of synthesis procedures built for the community, by the community.

Peer-reviewed by the community

Each contribution has a DOI – of value to the submitter?

Vandalism

Vandalism of ChemSpider is VERY rare…

Three acts of vandalism ever Someone tried to “sell a house!” A vendor posted their logo against a chemical A student, Katie Crow, posted a “personal

photo”

But data quality can appear like vandalism!

Drivers in the Social Network Anonymity is a choice in the social networks

Many people on Wikipedia are anonymous Many blogs are anonymous Comments on blogs can be anonymous

Anonymity in peer-review will likely become less important and may be generational

I may want acknowledgment if… I share my data I review a paper I share my expertise

The Alt-Metrics Manifesto

http://altmetrics.org/manifesto/

Enabled by ORCID…

Who declares data as Open? Data licensing is very interesting and can spark

“interesting” conversations. Opinions differ: Are images data? Are assertions data? What on a ChemSpider record is data? Is PubChem or PubMed Open Data?

We allow people to declare their data as Open and add an Open Data button at upload

A lot of data on ChemSpider are free but not Open Pragmatism: Our focus is a community resource

Licensing “My Work” Online Is it “my” chemistry once it’s online?

The complex nature of licensing “my” chemistry Blogs - copyrighted and creative commons Wikis - mixed licensing, depends on the host(s) Data – much value in sharing data as “Open Data”

Often, people can make money from your work!

Police your own “licensing” – how many people have read the Facebook and Twitter agreements?!

ChemSpiderA Structure Centric Host

An established community resource

>25 million compounds from >400 data sources Thousands of users per day Approaching a million transactions per day A crowdsourced deposition and curation platform Grows daily – more depositions, more data A publishing platform for the community Contributions welcome! Learn how…

ChemSpider Training Session

ChemSpider: A Community Resource for Chemical Data

Wednesday, March 30th

8:30-11:00 AM

Anaheim Convention Center, Room 211 A

Acknowledgments RSC|ChemSpider team The “Crowd” of curators All Data Source providers

GGA Software Services ACD/Labs OpenEye Accelrys

Thank you

Email: williamsa@rsc.org Twitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams

top related