organizing and structuring data for digital projects suzanne huffman digital resources librarian...

25
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Upload: silas-ray

Post on 26-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS

Suzanne Huffman

Digital Resources Librarian

Simpson Library

Page 2: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

DATA AND DIGITAL PROJECTS

Page 3: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Nothing really important is ever headlined “Here is some data.

Hope you find something interesting.” Annotation is critical. Editing is critical.

-Amanda Cox, New York Times Graphics Editor

http://www.slideshare.net/openjournalism/amanda-cox-visualizing-data-at-the-new-york-times

Page 4: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

The importance of context

http://www.nejm.org/doi/full/10.1056/NEJMp1402114?query=TOC&

Page 5: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Add value to your data

• Regardless of website functionality, annotation and guidance are important

• Look at other digital projects and test out similar sites to see what types of discovery and analysis activities you intuitively want to do with similar types of data

• Follow web design methodology by creating user stories and acting them out with mockups, wireframes, or simple spreadsheets

Page 6: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

ORGANIZING AND STRUCTURING DATA

Page 7: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Metadata

Metadata is structured information that describes, explains, locates, or

otherwise makes it easier to retrieve, use, or manage an information

resource. Metadata is often called data about data or information about

information.

http://www.niso.org/publications/press/UnderstandingMetadata.pdf

Page 10: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Choosing metadata parameters• Ask yourself why you are collecting the information you

want to collect (perform a needs assessment)

• Focus on the outcomes and analysis tasks you want your site’s users to be able to perform with your data

• Depending on your audience, don’t assume users know what each metadata field or category means

• Choose data fields and parameters and do a few test runs before analyzing the data to determine if you need to add, change, or edit any of your fields

Page 11: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

What is good metadata?According to Understanding Metadata by the National Information Standards Organization, good metadata…

• Should be appropriate to the materials in the collection, users of the collection, and intended, current, and likely use of the digital object

• Supports interoperability• Uses standard controlled vocabularies to reflect the what, where,

when, and who of the content• Includes a clear statement on the conditions and terms of use for

the digital object• Should have the qualities of archivability, persistence, and

unique identification, and should be authoritative and verifiable• Supports the long-term management of objects in collections

Page 12: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Choose and use standard terminology• Use a controlled vocabulary that provides preferred

keywords and terminology for specific items

• Create a data dictionary and be consistent in applying it

• Example data dictionary entry: Dates are displayed in the yyyy-mm-dd format; i.e., March 15, 2015 would appear as 2015-03-15

• Helps prevent inconsistencies in data entry and analysis Example: when "T", "temp", and "t" are all used interchangeably within a single dataset to refer to temperature measurements

Page 13: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Metadata dos• Select your keywords wisely and think about the many

ways someone might search for your data

• Use your data dictionary whenever possible to create keywords to establish a controlled vocabulary

• Use descriptive and clear writing

• Ensure that all data fields are independent and that they could exist on their own

Page 14: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Metadata don’ts• Do not use jargon; define technical terms and acronyms

and put them in your data dictionary

• Remember that a computer will read the information in the metadata record, so do not to use tabs, indents, or special characters like ! @ # % { } | / \ < > ~ that may be misunderstood

• Do not copy and paste content from word documents or other sources into your metadata record (use a text editor as a middle step to prevent unnecessary characters and errors being introduced)

Page 15: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Metadata is

structured

information

Example Dublin Core record in XML

Page 16: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Data structure

• Structuring your data is important to ensure your site functions well and that the dataset can be used in a variety of ways

• Ways to structure data:• For Excel or Google spreadsheets, save your data as

CSV files in plain text format• XML documents can be easily created through online

data-entry forms and contain your metadata within a structured framework

Page 17: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Data best practices

• Make sure your data is portable

• Saved in an additional location outside your site in machine-readable, non-proprietary format

• Portable data is flexible, sharable, and can be harvested by a variety of tools for usage in future projects

Page 18: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Quality assurance and control

• Restrict what information can be entered into the dataset• Limit the use of free text fields for metadata• Use lookup tables or drop-down menus for data entry• Use validation tools• Do manual review• Clean up and normalize messy data with tools like Open

Refine

Page 19: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

DATA MANAGEMENT

Page 20: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

What is data management?Data management refers to all aspects of creating, housing, delivering, maintaining, archiving, and preserving data. A data management plan accounts for every activity within the data life cycle.

https://www.dataone.org/best-practices

Page 21: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Contents of data management plan (DMP)

• Data Type and Format• Data Storage• Data Standards• Data Security• Data Sharing• Long-term Access

Check out VCU Libraries’ Research Data Management Guide at http://guides.library.vcu.edu/c.php?g=47977&p=300081

Page 22: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Data citation and preservationCitation

Dataset Citations should have (at a minimum):

Creator (PublicationYear): Title. Publisher. Identifier.

The Identifier could be a DOI or just the website’s URL

Preservation

Good documentation on data provenance (the origin and history of a dataset) is crucial.

If data cannot be recreated or if it is costly to reproduce, it should be saved.

Datasets that have significant long-term value may be contributed to a repository for preservation.

Page 23: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Data repositories

These repositories can be used to find data for reuse or to deposit your research data for preservation and sharing:

Page 24: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

Questions? Comments?

Thank you!

[email protected]

540-654-1756

Please contact me if you need assistance with managing and organizing data in your research or teaching projects.

Page 25: ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library

References and Resources• http://

www.slideshare.net/openjournalism/amanda-cox-visualizing-data-at-the-new-york-times

• http://www.nejm.org/doi/full/10.1056/NEJMp1402114?query=TOC&

• http://www.niso.org/publications/press/UnderstandingMetadata.pdf

• http://www.dlib.indiana.edu/~jenlrile/metadatamap/seeingstandards.pdf

• https://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf

• http://guides.library.vcu.edu/c.php?g=47977&p=300081