data101 pmcb retreat_09-20-13_final
DESCRIPTION
TRANSCRIPT
DATA MANAGEMENT 101Nicole Vasilevsky, Jackie Wirz and Melissa HaendelPMCB New Student Orientation20 September 2013
1 | Data definitions
2 | Dealing with data
3 | How the OHSU Library can help
Nicole Vasilevsky, PhD
Project Manager, Ontology Development Group
Jackie Wirz, PhD
Assistant Professor, Bioinformation Specialist
Melissa Haendel, PhD
Assistant Professor, Lead,Ontology Development Group
1 | Data
definitions
Data does not speak for itself…
YOU speak for YOUR data
But First, you need to manage it
But, even more fundamentally…
datameans manythings…
what does data mean to you?
What are data?
Experimental data
Social data
School related data
Personal data
Do you know what metadata is?a. Philosophy
b. describes data
c. dating site
d. data
2 | dealing with
data
Do you get frustrated with any of the following?
a. Storing data
b. Backing up data
c. Analyzing/manipulating data
d. Finding data produced by other researchers/clinicians
e. Ensuring data are secure
f. Making data accessible to other researchers
g. Controlling access to data
h. Tracking updates to data (ie versioning)
i. Creating metadata (ie describing the data to be more useful at a later
time or by others)
j. Protecting intellectual property rights
k. Ensuring appropriate professional credit/citation is given to data
sets/generated
Why?
Personal organization
Efficiency
Credit where credit is due
Accelerate scientific and clinical discovery
Reproducibility of science and medicine
naming | metadata | tools | standards
How?
naming
File naming
Naming conventions
Project_instrument_location_YYYYMMDDhhmmss_extra.ext
Index/grantconditions
Leading zero!
s/n, variableRetain order
Naming: Directory Structure
PCMB presentation
Library presentation
DMICE presentation
Presentations
PMCB Library DMICE
http://ftp.ihmc.us/
ReadMe
Version Control
Versioning
• Save a copy of every version of a file• Follow a file naming convention
Data101_PMCB_Retreat_09-20-13_v1
Data101_PMCB_Retreat_09-20-13_v2
Data101_PMCB_Retreat_09-20-13_Final
Versioning
Versioning
VersioningVersion Control software:• GIT • SVN
Backups
Which of the following do you do? a. Save copies of data on a disk, USB drive, or
computer hard drive
b. Save copies of data on a local server
c. Save copies of data on a central campus server
d. Save copies of data on a web based or cloud server
e. Store data in a repository or archives
f. Automatically backup files
g. Manually generate backup
h. Restrict access to files
1 on your local workstation 1 local/removable, such as external hard
drive 1 on central server 1 remote, such as on a cloud server*
*Depending on the type of data, as cloud servers are not always secure
Where can you backup your data?
Metadata
What is Metadata?
TitleAuthorCall numberPublisherISBN
- Anne Gilliland
Your metadata should make
your data understandabl
e to others without your involvement
MetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadatametadata
MetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadatametadata
MetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadatametadata
Are you aware of data standards in your field?
data standards
Data standards are the rules by which data are described and recorded. In order to share, exchange, and understand data, we must standardize the format as well as the meaning.http://www.usgs.gov/datamanagement/plan/datastandards.php
Controlled vocabularies
Structured data helps with searching
Craigslist search: Chaise
Craigslist matches on strings only
Craigslist search: Fainting couch
Structured data helps with searching
PubMed indexes articles with MeSH Terms
Structured data helps with searching
Why are CVs and Ontologies useful?
• Can be used to structure your metadata• Are often used to structure information in
databases
Cell Ontology Linnean Taxonomy
Order
GenusSpecies
PhylumClass
Family
Kingdom
tools
File renaming applications
• Bulk Rename Utility (Windows)• Renamer (Mac)• PSRenamer
Data Management tools and repositories
• Purpose: Software where you can organize, store and/or share data
• Often contain metadata to assist with data entry and create structured data
Tools for data management
Repositories use Unique IDs
• Document Object Identifier (DOI)• Example: DOIs for publications
– doi: 10.1371/journal.pbio.1001339
• Unique resource identifier (URI)• A URI will resolve to a single location on the
web• URIs for people
• Example: • John L Campbell, Research Ecologist, Oregon State University, Corvallis
OR• John L Campbell, Research Ecologist, Center for Research on
Ecosystem Change, Durham, NC
standards
nomenclature
antibodies
Western BlotImmunohistochemstry
ELI
SA
Co-immunoprecipitation
ChIP Flow
cytometryELISPO
T
Radioimmunoassay
FACS analysis of T cells from LNs and tumorsT cells were liberated from LNs by disruption between two frosted glass slides. Cells from LNs and tumors were stained with various combination of the following Abs: FITC-CD4, allophycocyanin-CD25, PE Cy7-CD8, APC-CD62L, PE-CD25, PE Cy7-CD25, and biotinylated-KJ-126 and in some experiments made permeable with fixation/permeablization buffers and stained with PE-FoxP3 (eBioscience). Harvested samples, isotype controls, and single stain controls were run on the FACSCalibur (BD Biosciences).
Ruby and Weinberg (2009) J Immunol. 182(3):1481-9.
Which antibody did they use in the paper?
A Solution: Antibody Registry
antibodyregistry.org
Meet the Urban Lab
Meet the Urban Lab
A+ organization!
The Urban lab antibodies
0%10%20%30%40%50%60%70%80%90%
Of 14 antibodies published in 45 articles, only 38% were identifiable
Per
cent
iden
tifia
ble
http://www.force11.org/node/4463
http://biosharing.org/bsg-000532
http://www.biosharing.org/standards/mibbi
Minimum Information for Biological and Biomedical Investigations
data publication and sharing
Why share data?
• Data sharing mandates
• Further science and and medicine
• Build collaborations• Enable new
discoveries with your data
• Can be required at time of publication
Distribution of 2004–2005 citation counts of 85 trials by data availability.
How?
Beyond the PDF: What can be published (and cited)?
Raw Science
Nanopublications
Self-publishing
Beyond the PDF: What can be published (and cited)?
Raw Science
Nanopublications
Self-publishing
Datasets
Code
Experimental design
Argument or passage Blogging
Microblogging
Comments on existing workAnnotations on existing
work
Single figure publications
How?
Data Journals and Repositories
• FigShare• Dryad• DataVerse (social science)• Institutional repositories
www.impactstory.org
3 | How the OHSU Library can help
1 | Large Lecture: Data Management
101
2 | 10 –15 Small Groups: data
playground• 1 researcher paired with 2 or 3 library staff
• Tailored analysis of data reporting and instruction
Save the date:10/09/134-6pm
1k challenge award recipients
Thank you!
URLs to resources
Go to:
http://libguides.ohsu.edu/data