data101 pmcb retreat_09-20-13_final

79
DATA MANAGEMENT 101 Vasilevsky, Jackie Wirz and Melissa Haendel ew Student Orientation tember 2013

Upload: jackie-wirz

Post on 27-Jan-2015

105 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data101 pmcb retreat_09-20-13_final

DATA MANAGEMENT 101Nicole Vasilevsky, Jackie Wirz and Melissa HaendelPMCB New Student Orientation20 September 2013

Page 2: Data101 pmcb retreat_09-20-13_final
Page 3: Data101 pmcb retreat_09-20-13_final

1 | Data definitions

2 | Dealing with data

3 | How the OHSU Library can help

Page 4: Data101 pmcb retreat_09-20-13_final

Nicole Vasilevsky, PhD

Project Manager, Ontology Development Group

Jackie Wirz, PhD

Assistant Professor, Bioinformation Specialist

Melissa Haendel, PhD

Assistant Professor, Lead,Ontology Development Group

Page 5: Data101 pmcb retreat_09-20-13_final

1 | Data

definitions

Page 6: Data101 pmcb retreat_09-20-13_final

Data does not speak for itself…

Page 7: Data101 pmcb retreat_09-20-13_final

YOU speak for YOUR data

Page 8: Data101 pmcb retreat_09-20-13_final

But First, you need to manage it

Page 9: Data101 pmcb retreat_09-20-13_final

But, even more fundamentally…

Page 10: Data101 pmcb retreat_09-20-13_final

datameans manythings…

Page 11: Data101 pmcb retreat_09-20-13_final

what does data mean to you?

Page 12: Data101 pmcb retreat_09-20-13_final
Page 13: Data101 pmcb retreat_09-20-13_final
Page 14: Data101 pmcb retreat_09-20-13_final
Page 15: Data101 pmcb retreat_09-20-13_final

What are data?

Experimental data

Social data

School related data

Personal data

Page 16: Data101 pmcb retreat_09-20-13_final
Page 17: Data101 pmcb retreat_09-20-13_final

Do you know what metadata is?a. Philosophy

b. describes data

c. dating site

d. data

Page 18: Data101 pmcb retreat_09-20-13_final

2 | dealing with

data

Page 19: Data101 pmcb retreat_09-20-13_final

Do you get frustrated with any of the following?

a. Storing data

b. Backing up data

c. Analyzing/manipulating data

d. Finding data produced by other researchers/clinicians

e. Ensuring data are secure

f. Making data accessible to other researchers

g. Controlling access to data

h. Tracking updates to data (ie versioning)

i. Creating metadata (ie describing the data to be more useful at a later

time or by others)

j. Protecting intellectual property rights

k. Ensuring appropriate professional credit/citation is given to data

sets/generated

Page 20: Data101 pmcb retreat_09-20-13_final

Why?

Personal organization

Efficiency

Credit where credit is due

Accelerate scientific and clinical discovery

Reproducibility of science and medicine

Page 21: Data101 pmcb retreat_09-20-13_final

naming | metadata | tools | standards

How?

Page 22: Data101 pmcb retreat_09-20-13_final

naming

Page 23: Data101 pmcb retreat_09-20-13_final

File naming

Page 24: Data101 pmcb retreat_09-20-13_final
Page 25: Data101 pmcb retreat_09-20-13_final

Naming conventions

Project_instrument_location_YYYYMMDDhhmmss_extra.ext

Index/grantconditions

Leading zero!

s/n, variableRetain order

Page 26: Data101 pmcb retreat_09-20-13_final

Naming: Directory Structure

Page 27: Data101 pmcb retreat_09-20-13_final

PCMB presentation

Library presentation

DMICE presentation

Presentations

PMCB Library DMICE

Page 28: Data101 pmcb retreat_09-20-13_final

http://ftp.ihmc.us/

Page 29: Data101 pmcb retreat_09-20-13_final

ReadMe

Page 30: Data101 pmcb retreat_09-20-13_final

Version Control

Page 31: Data101 pmcb retreat_09-20-13_final

Versioning

• Save a copy of every version of a file• Follow a file naming convention

Data101_PMCB_Retreat_09-20-13_v1

Data101_PMCB_Retreat_09-20-13_v2

Data101_PMCB_Retreat_09-20-13_Final

Page 32: Data101 pmcb retreat_09-20-13_final

Versioning

Page 33: Data101 pmcb retreat_09-20-13_final

Versioning

Page 34: Data101 pmcb retreat_09-20-13_final

VersioningVersion Control software:• GIT • SVN

Page 35: Data101 pmcb retreat_09-20-13_final

Backups

Page 36: Data101 pmcb retreat_09-20-13_final

Which of the following do you do? a. Save copies of data on a disk, USB drive, or

computer hard drive

b. Save copies of data on a local server

c. Save copies of data on a central campus server

d. Save copies of data on a web based or cloud server

e. Store data in a repository or archives

f. Automatically backup files

g. Manually generate backup

h. Restrict access to files

Page 37: Data101 pmcb retreat_09-20-13_final

1 on your local workstation 1 local/removable, such as external hard

drive 1 on central server 1 remote, such as on a cloud server*

*Depending on the type of data, as cloud servers are not always secure

Where can you backup your data?

Page 38: Data101 pmcb retreat_09-20-13_final

Metadata

Page 39: Data101 pmcb retreat_09-20-13_final

What is Metadata?

TitleAuthorCall numberPublisherISBN

Page 40: Data101 pmcb retreat_09-20-13_final
Page 41: Data101 pmcb retreat_09-20-13_final

- Anne Gilliland

Your metadata should make

your data understandabl

e to others without your involvement

MetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadatametadata

MetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadatametadata

MetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadataMetadatametadata

Page 42: Data101 pmcb retreat_09-20-13_final

Are you aware of data standards in your field?

Page 43: Data101 pmcb retreat_09-20-13_final

data standards

Data standards are the rules by which data are described and recorded. In order to share, exchange, and understand data, we must standardize the format as well as the meaning.http://www.usgs.gov/datamanagement/plan/datastandards.php

Page 44: Data101 pmcb retreat_09-20-13_final

Controlled vocabularies

Page 45: Data101 pmcb retreat_09-20-13_final

Structured data helps with searching

Craigslist search: Chaise

Craigslist matches on strings only

Craigslist search: Fainting couch

Page 46: Data101 pmcb retreat_09-20-13_final

Structured data helps with searching

PubMed indexes articles with MeSH Terms

Page 47: Data101 pmcb retreat_09-20-13_final

Structured data helps with searching

Page 48: Data101 pmcb retreat_09-20-13_final

Why are CVs and Ontologies useful?

• Can be used to structure your metadata• Are often used to structure information in

databases

Cell Ontology Linnean Taxonomy

Order

GenusSpecies

PhylumClass

Family

Kingdom

Page 49: Data101 pmcb retreat_09-20-13_final

tools

Page 50: Data101 pmcb retreat_09-20-13_final

File renaming applications

• Bulk Rename Utility (Windows)• Renamer (Mac)• PSRenamer

Page 51: Data101 pmcb retreat_09-20-13_final

Data Management tools and repositories

• Purpose: Software where you can organize, store and/or share data

• Often contain metadata to assist with data entry and create structured data

Page 52: Data101 pmcb retreat_09-20-13_final

Tools for data management

Page 53: Data101 pmcb retreat_09-20-13_final

Repositories use Unique IDs

• Document Object Identifier (DOI)• Example: DOIs for publications

– doi: 10.1371/journal.pbio.1001339

• Unique resource identifier (URI)• A URI will resolve to a single location on the

web• URIs for people

Page 54: Data101 pmcb retreat_09-20-13_final

• Example: • John L Campbell, Research Ecologist, Oregon State University, Corvallis

OR• John L Campbell, Research Ecologist, Center for Research on

Ecosystem Change, Durham, NC

Page 55: Data101 pmcb retreat_09-20-13_final
Page 56: Data101 pmcb retreat_09-20-13_final

standards

Page 57: Data101 pmcb retreat_09-20-13_final

nomenclature

Page 58: Data101 pmcb retreat_09-20-13_final

antibodies

Western BlotImmunohistochemstry

ELI

SA

Co-immunoprecipitation

ChIP Flow

cytometryELISPO

T

Radioimmunoassay

Page 59: Data101 pmcb retreat_09-20-13_final

FACS analysis of T cells from LNs and tumorsT cells were liberated from LNs by disruption between two frosted glass slides. Cells from LNs and tumors were stained with various combination of the following Abs: FITC-CD4, allophycocyanin-CD25, PE Cy7-CD8, APC-CD62L, PE-CD25, PE Cy7-CD25, and biotinylated-KJ-126 and in some experiments made permeable with fixation/permeablization buffers and stained with PE-FoxP3 (eBioscience). Harvested samples, isotype controls, and single stain controls were run on the FACSCalibur (BD Biosciences).

Ruby and Weinberg (2009) J Immunol. 182(3):1481-9.

Page 60: Data101 pmcb retreat_09-20-13_final

Which antibody did they use in the paper?

Page 61: Data101 pmcb retreat_09-20-13_final

A Solution: Antibody Registry

antibodyregistry.org

Page 62: Data101 pmcb retreat_09-20-13_final
Page 63: Data101 pmcb retreat_09-20-13_final

Meet the Urban Lab

Meet the Urban Lab

Page 64: Data101 pmcb retreat_09-20-13_final

A+ organization!

The Urban lab antibodies

Page 65: Data101 pmcb retreat_09-20-13_final

0%10%20%30%40%50%60%70%80%90%

Of 14 antibodies published in 45 articles, only 38% were identifiable

Per

cent

iden

tifia

ble

Page 66: Data101 pmcb retreat_09-20-13_final

http://www.force11.org/node/4463

http://biosharing.org/bsg-000532

Page 67: Data101 pmcb retreat_09-20-13_final

http://www.biosharing.org/standards/mibbi

Minimum Information for Biological and Biomedical Investigations

Page 68: Data101 pmcb retreat_09-20-13_final

data publication and sharing

Page 69: Data101 pmcb retreat_09-20-13_final

Why share data?

• Data sharing mandates

• Further science and and medicine

• Build collaborations• Enable new

discoveries with your data

• Can be required at time of publication

Page 70: Data101 pmcb retreat_09-20-13_final

Distribution of 2004–2005 citation counts of 85 trials by data availability.

Page 71: Data101 pmcb retreat_09-20-13_final

How?

Page 72: Data101 pmcb retreat_09-20-13_final

Beyond the PDF: What can be published (and cited)?

Raw Science

Nanopublications

Self-publishing

Page 73: Data101 pmcb retreat_09-20-13_final

Beyond the PDF: What can be published (and cited)?

Raw Science

Nanopublications

Self-publishing

Datasets

Code

Experimental design

Argument or passage Blogging

Microblogging

Comments on existing workAnnotations on existing

work

Single figure publications

Page 74: Data101 pmcb retreat_09-20-13_final

How?

Data Journals and Repositories

• FigShare• Dryad• DataVerse (social science)• Institutional repositories

Page 75: Data101 pmcb retreat_09-20-13_final

www.impactstory.org

Page 76: Data101 pmcb retreat_09-20-13_final

3 | How the OHSU Library can help

Page 77: Data101 pmcb retreat_09-20-13_final

1 | Large Lecture: Data Management

101

2 | 10 –15 Small Groups: data

playground• 1 researcher paired with 2 or 3 library staff

• Tailored analysis of data reporting and instruction

Save the date:10/09/134-6pm

1k challenge award recipients

Page 78: Data101 pmcb retreat_09-20-13_final

Thank you!

Page 79: Data101 pmcb retreat_09-20-13_final

URLs to resources

Go to:

http://libguides.ohsu.edu/data