feb.2016 demystifying digital humanities - workshop 3

28
Data Wrangling II: Programming on the Whiteboard February 26, 2016 Paige Morgan Digital Humanities Librarian

Upload: paige-morgan

Post on 13-Apr-2017

1.149 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Feb.2016 Demystifying Digital Humanities - Workshop 3

Data Wrangling II:Programming on the Whiteboard

February 26, 2016Paige Morgan

Digital Humanities Librarian

Page 2: Feb.2016 Demystifying Digital Humanities - Workshop 3

Starting Activity:Open Syllabus Project

http://opensyllabusproject.org/

Page 3: Feb.2016 Demystifying Digital Humanities - Workshop 3

Open Syllabus Project•Use the syllabus explorer to

examine the data.•Keep track of each step you take

as you drill down.•Goal: develop a research question

based on your explorations.•What other data would you need

to answer this research question?

Page 4: Feb.2016 Demystifying Digital Humanities - Workshop 3

Last week...•The work of creating usable data•Forms that this data might take:

•markup language•Spreadsheets (MySQL & relational

DBs)•Non-relational databases

(RDF/Linked Open Data

Page 5: Feb.2016 Demystifying Digital Humanities - Workshop 3

This week:•Caveat Curator (challenges of

working with data)•Programming on the Whiteboard,

i.e., conceptualizing the specific steps that you need to take to accomplish your goals

Page 6: Feb.2016 Demystifying Digital Humanities - Workshop 3

Goals/Takeaways•A better understanding of the

workflow for dealing with data•How to start small and scale

up effectively•Greater ability to talk about

what you’re trying to do

Page 7: Feb.2016 Demystifying Digital Humanities - Workshop 3

Why this focus on data?•Understanding your data, and

your intended actions, is a key skill for developing any digital project (big or small).

•You may have one big project – but your data may support several small/intermediary projects.

Page 8: Feb.2016 Demystifying Digital Humanities - Workshop 3

Image: Josh Lee, @wtrsld, via Twitter, January 2014.

Page 9: Feb.2016 Demystifying Digital Humanities - Workshop 3

What if your data is crowdsourced?

Page 10: Feb.2016 Demystifying Digital Humanities - Workshop 3

You can require a particular format for

submissions

Page 11: Feb.2016 Demystifying Digital Humanities - Workshop 3

You can even put programmatic limits on

the formats available for submission

Page 12: Feb.2016 Demystifying Digital Humanities - Workshop 3

But in the end, you’re probably still going to need to scrub and/or

format.

Page 13: Feb.2016 Demystifying Digital Humanities - Workshop 3

This is true even for data from supposedly reputable sources, like government or media

organizations.

Page 14: Feb.2016 Demystifying Digital Humanities - Workshop 3

Example: Doctor Who Villains dataset

http://tinyurl.com/doctorwhovillains

Page 15: Feb.2016 Demystifying Digital Humanities - Workshop 3

Data Dictionaries

Page 16: Feb.2016 Demystifying Digital Humanities - Workshop 3

If you are thinking about your data, and the tasks

that you need to accomplish, then it’s

easier to determine what sort of language or

platform your project needs.

Page 17: Feb.2016 Demystifying Digital Humanities - Workshop 3

Pseudocode•Used by programmers to break

down a complex task into single steps

•Easily adaptable for use by non-programmers

Page 18: Feb.2016 Demystifying Digital Humanities - Workshop 3

Pseudocode Example (Visible Prices)• Computer has a file that contains prices from

different texts.

• Computer must know that each price amount is connected with an object, and with a bibliographical record.

• Users can input a price amount, and computer will retrieve all objects that match the price, and display them to the user, along with bibliographical information.

• (More complex): Computer is able to retrieve prices linked with certain categories (clothing, food, etc.)

Page 19: Feb.2016 Demystifying Digital Humanities - Workshop 3

It is likely that your data will have a longer life span than any specific

project you create.

Page 20: Feb.2016 Demystifying Digital Humanities - Workshop 3

In many instances, it may be more useful to

focus on the data curation as much as a

single project.

Page 21: Feb.2016 Demystifying Digital Humanities - Workshop 3

Getting Data•Figshare•Datahub.io•Project websites•APIs

Page 22: Feb.2016 Demystifying Digital Humanities - Workshop 3

Cleaning Data

•OpenRefine http://openrefine.org/

Page 23: Feb.2016 Demystifying Digital Humanities - Workshop 3

Key DH Values•Adaptive•Sustainable/resource-aware•Collaborative•Social

Page 24: Feb.2016 Demystifying Digital Humanities - Workshop 3

Key skills•Thinking flexibly about your data (and potential project)

•Are there portions of your dataset that could be extracted for use in a particular tool?

•How can you adjust your data in order to show it to people (and be more able to talk/write/present about your research interests?)

Page 25: Feb.2016 Demystifying Digital Humanities - Workshop 3

And now, it’s your turn...

Page 26: Feb.2016 Demystifying Digital Humanities - Workshop 3

Group Activity•What questions can you ask and

answer with this data as it is?•What data would you need in

order to ask & answer other research questions?

•What are the steps that you would need to take in order to answer those research questions?

Page 27: Feb.2016 Demystifying Digital Humanities - Workshop 3

Next steps•What’s the smallest version of your dataset possible? (useful for testing out tools)

•Possible tools to examine (as ways of presenting your data)• Omeka (http://www.omeka.net)

• Scalar (http://scalar.usc.edu)

• Simile (http://www.simile-widgets.org)

• Google Fusion Tables (https://support.google.com/fusiontables/answer/2571232)

Page 28: Feb.2016 Demystifying Digital Humanities - Workshop 3

Thank you!

•Questions? Ideas? Book a consult at http://paigecmorgan.youcanbook.me