dmdh winter 2015 session #2

104
Winter 2015: Session #2 Programming on the Whiteboard (Sarah Kremen-Hicks & Brian Gutierrez)

Upload: sarahkh12

Post on 30-Jul-2015

162 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Dmdh winter 2015 session #2

Winter 2015: Session #2Programming on the Whiteboard

(Sarah Kremen-Hicks & Brian Gutierrez)

Page 2: Dmdh winter 2015 session #2

Previously, at DMDH...

•The work of creating usable data

•Forms that this data might take:

•markup language

•spreadsheets

Page 3: Dmdh winter 2015 session #2

Workshop #2•Caveat Curator (challenges of working with data)

•Programming on the whiteboard, i.e., conceptualizing the specific steps that you need to take to accomplish your goals

Page 4: Dmdh winter 2015 session #2

Why this focus on data?•Understanding your data, and

your intended actions, is a key skill for working with any programming language or platform.

•This is true whether you are the programmer or whether you are working with professional programmers.

Page 5: Dmdh winter 2015 session #2

Programming languages are like human languages in that they both have phrases, patterns, and

rules.

Page 6: Dmdh winter 2015 session #2

Programming languages are unlike human languages in

that they aren’t for

communicating with people.

Page 7: Dmdh winter 2015 session #2

They are also unlike human

languages in that every programming utterance does something, i.e.,

causes an action to occur.

Page 8: Dmdh winter 2015 session #2

You can get used to patterns – even unfamiliar

ones.

Page 9: Dmdh winter 2015 session #2

The shift is in getting used to thinking in

terms of every single action.

Page 10: Dmdh winter 2015 session #2

Our subject matter today is all actions that you’ll need to think about before you work with...

Page 11: Dmdh winter 2015 session #2

Image: Josh Lee, @wtrsld, via Twitter, January 2014.

Page 12: Dmdh winter 2015 session #2

Even when you’re just experimenting, you need to prep your

data.

Page 13: Dmdh winter 2015 session #2

You may know your dataset in detail already, from your research -- but your

computer is concerned with

different levels of detail.

Page 14: Dmdh winter 2015 session #2

Becoming aware of those levels of

detail is not only helpful for your project ideas...

Page 15: Dmdh winter 2015 session #2

...it’s also a useful skill for working with programming

languages.(where a stray /> or ; can break your program/website)

Page 16: Dmdh winter 2015 session #2

Caveat Curator

Page 17: Dmdh winter 2015 session #2

Data only works if your computer can

read it.

Page 18: Dmdh winter 2015 session #2

But my data is just text!

(Isn’t that easy?)

Page 19: Dmdh winter 2015 session #2

(Remember, your computer is fairly

stupid).

Page 20: Dmdh winter 2015 session #2

Formatted text is

often full of text your

computer can’t parse correctly.

Page 21: Dmdh winter 2015 session #2

The┘re┘sÜlt ís that yoÜr te┘xt

might come┘ oÜt looking

like┘this

whe┘n yoÜ ope┘n it in a

programming e┘nvironme┘nt.

Page 22: Dmdh winter 2015 session #2

So you need to

convert it to plain text.

(without any of the fancy details encoded in MS Word

fonts.)

Page 23: Dmdh winter 2015 session #2

But even that can produce unexpected

errors.

Page 24: Dmdh winter 2015 session #2

Maybe you want to work with sailing data and ports of

call:

Page 25: Dmdh winter 2015 session #2

The ship you’re interested in leaves the Ivory Coast for

St. Helena...

Page 26: Dmdh winter 2015 session #2
Page 27: Dmdh winter 2015 session #2

But when you create your map, you get

this:

Page 28: Dmdh winter 2015 session #2
Page 29: Dmdh winter 2015 session #2

The latitude/longitude coordinate is the significant datum.

Page 30: Dmdh winter 2015 session #2

The city name is just the human-readable

component.

Page 31: Dmdh winter 2015 session #2

Each datum needs to be unique.

Page 32: Dmdh winter 2015 session #2

Figuring out what sort of

unique configuration will work best involves at least some

experimentation.

Page 33: Dmdh winter 2015 session #2

To experiment effectively, you’ll want to keep careful

records.

Page 34: Dmdh winter 2015 session #2

If you develop categories of

information, you’ll want to keep a

record of what each category means, and what its limits

are.

Page 35: Dmdh winter 2015 session #2

Cleaning and structuring your

data is a foundation issue that changes, depending on the

available format of your data.

Page 36: Dmdh winter 2015 session #2

What if your data is crowdsourced?

Page 37: Dmdh winter 2015 session #2

You can require a particular format for

submissions

Page 38: Dmdh winter 2015 session #2

You can even put programmatic limits

on the formats available for submission

Page 39: Dmdh winter 2015 session #2

But in the end, you’re still going to need to scrub and/or

format.

Page 40: Dmdh winter 2015 session #2

This is true even for data from supposedly

reputable sources, like government or

media organizations.

Page 41: Dmdh winter 2015 session #2

Example: Doctor Who Villains dataset

http://tinyurl.com/doctorwhovillains

Page 42: Dmdh winter 2015 session #2

This step is no fun!

Page 43: Dmdh winter 2015 session #2

But it’s absolutely necessary.

Page 44: Dmdh winter 2015 session #2

Break!

Page 45: Dmdh winter 2015 session #2

Working with multiple types of data:

GIS and the Spatial Turn

Page 46: Dmdh winter 2015 session #2

GIS technology has paved the way for the analyzing qualitative data associated with cultural experiences

Page 47: Dmdh winter 2015 session #2

“A good map is worth a thousand words, cartographers say, and

they are right: because it produces a thousand words: it

raises doubts, ideas. It poses new questions, and forces you

to look for new answers.”

(Moretti 1998, 3–4)

Page 48: Dmdh winter 2015 session #2

Literary texts are filled with

subjective spatial data: an author or

character's articulation of geographically

located dwellings, urban and rural

landscapes, as well as performance spaces

Page 49: Dmdh winter 2015 session #2

Project: Mapping William Wordsworth's

Conspicuous Consumption in The

Prelude

(Brian R. Gutierrez)

Page 50: Dmdh winter 2015 session #2

Objective: to map the visual culture events referenced in Wordsworth’s autobiographical poem The Prelude (as well as the ones not referenced)

Page 51: Dmdh winter 2015 session #2

Problem to solve: Prove that literary galleries, specifically Joseph Boydell’s “Shakespeare Gallery” shaped the dramaturgical choices in the only play written by Wordsworth. He reads Shakespeare not through a personal copy of the play, but through the visual and performative texts at that time

Page 52: Dmdh winter 2015 session #2

Data: place-names, indirect references,

and all non-referenced visual cultural events

Page 53: Dmdh winter 2015 session #2

Access to data: Project Gutenberg, digital archive of British newspapers and periodicals

Page 54: Dmdh winter 2015 session #2

What to do with that data?

Map it!!

Page 55: Dmdh winter 2015 session #2

First data set:Literary spatial articulations

Page 56: Dmdh winter 2015 session #2

Wordsworth mentions these following place names and references:

"Oh wonderous power of words, how sweet  they are  / According to the meaning which they bring-- / Vauxhall and Ranelagh, I then had heard / Of your green groves and wilderness of lamps, / Your gorgeous ladies, fairy cataracts, And pageant fireworks"  (119-125) "Half-rural Sadler's Wells" (267)

Page 57: Dmdh winter 2015 session #2

First, I need to know what and where these places were in order to identify them as

spatial data

Ex: Vauxhall and Ranelagh

Page 58: Dmdh winter 2015 session #2

Second, if I'm interested in visual cultural experiences, I need to identify what kind of event occurred there: galley play, etc.

Page 59: Dmdh winter 2015 session #2

Third, how would I access the data? Answer: place-names in a book are not under any copyright.  

However, if I wanted to include sections from the text when a viewer would click on that place name then I would have to think about copyright, but it's on PG, so that's covered.

Page 60: Dmdh winter 2015 session #2

Fourth, I would have to locate any indirect reference to visual cultural phenomena.

Ex: Wordsworth mentions two actresses by name Mary Robinson and Sarah Siddons.

Since I cannot map a person, I need to investigate which plays they were in and at which theaters during that moment of his life (it's an autobiography)

Page 61: Dmdh winter 2015 session #2

Fifth, I need to research what special events were occurring at other places he mentions. For that, I

look to The Times (newspapers) and various

periodicals.

Page 62: Dmdh winter 2015 session #2

Sixth, because I going to create a

map, using ArcGIS, I need to put my data

in an excel spreadsheet so that it can be read by the

program.

Page 63: Dmdh winter 2015 session #2
Page 64: Dmdh winter 2015 session #2

What is the relationship between

the data?

Page 65: Dmdh winter 2015 session #2

Analyze the qualitative data

Humanist skill=Dhumanist skill

Page 66: Dmdh winter 2015 session #2

Programming on the whiteboard involves

looking at the categories of

information, and thinking about how they interact.

Page 67: Dmdh winter 2015 session #2

Categories•Place names

•Poetic lines

•Genre of visual/cultural event

•Spatial data (latitude/longitude)

Page 68: Dmdh winter 2015 session #2

Return to the source of original data—the

literary text—to examine how the

author is describing these phenomena

Page 69: Dmdh winter 2015 session #2

Why use ArcGIS?

Page 70: Dmdh winter 2015 session #2

Benefits of ArcGIS•It allows the overlay of historical

maps

•Trainings were available and accessible (through DHSI and UW courses)

•As a software program, ArcGIS is established enough to be considered robust

•Available through the UW software suite

Page 71: Dmdh winter 2015 session #2

Disadvantages of ArcGIS•Available only for PCs

• Proprietary file format (even if input data is open-access, the end result is not)

•Available only on an annual subscription model (and prohibitively expensive for scholars without campus-granted access)

Page 72: Dmdh winter 2015 session #2

In Franco Moretti’s Atlas of the European

Novel 1800-1900 (1998), he calls for

a “literary geography,”

predicated on the creation of “readerly maps” and the use of

those maps as analytical tools.

Page 73: Dmdh winter 2015 session #2

Caveats?

The pursuit of mapping data may exclude complex

social spaces (e.g., gender domestic environments)

Page 74: Dmdh winter 2015 session #2

Caveats?

Cartographical representations should not be

divorced from their primary texts

Page 75: Dmdh winter 2015 session #2

Break!

Page 76: Dmdh winter 2015 session #2

Project: Visualizing Prosody

(Sarah Kremen-Hicks)

x / |x /|xx / | x / |x /Sir Walter Vivian all a summer's day / x | / x | x / | x / | x /Gave his broad lawns until the set of sun

Page 77: Dmdh winter 2015 session #2

Marking up a poem for metrical scansion is encoding it with

data.

What can a computer do with that data?

Page 78: Dmdh winter 2015 session #2

Computers are good at counting things –

like iambs.

Page 79: Dmdh winter 2015 session #2

Is it possible to predict deviations from a metrical norm based on author or

lyric classification?

Page 80: Dmdh winter 2015 session #2

Will authors show a tendency for

particular types of metrical

substitution?

Page 81: Dmdh winter 2015 session #2

Prepping the Data

•For proof of concept, start with one author (Alfred, Lord Tennyson)

•Get Tennyson’s poems from Project Gutenberg

•Hand-mark representative poems for prosody

Page 82: Dmdh winter 2015 session #2

Programming on the Whiteboard

What should the computer do?

Page 83: Dmdh winter 2015 session #2

Computer tasks•Count feet per line

•Recognize | as a foot boundary

•Recognize carriage return as a line boundary

•Supply foot boundaries at beginning/end of lines

•Count the number of areas contained within foot boundaries for each line

Page 84: Dmdh winter 2015 session #2

These steps involve recognizing each metrical foot as units that contain

particular accentual-syllabic data.

x / |x /|xx / | x / |x /

Sir Walter Vivian all a summer's day

Page 85: Dmdh winter 2015 session #2

Computer tasks, cont’d.•Identify the most common

number of feet per line

•Supply a report on lines (by number) that deviate

•Calculate rate of deviation/adherence

•Mode = paradigm

Page 86: Dmdh winter 2015 session #2

After recognizing the foot as a unit, the

computer can calculate what patterns of data each foot contains.

Page 87: Dmdh winter 2015 session #2

Computer tasks, cont’d.

•Identify the most common foot type

•Identify markings within foot boundaries

•Compare markings to foot dictionary to identify type

Page 88: Dmdh winter 2015 session #2

These tasks identify each line as a unit composed of one or

more feet.

x / |x /|xx / | x / |x /

Sir Walter Vivian all a summer's day

(iambic pentameter with third foot anapestic substitution)

Page 89: Dmdh winter 2015 session #2

Still more computing tasks!•Identify the most common foot type within a poem

•Supply a report on feet (by line and foot number) that deviate

•Calculate rate of deviation/adherence

•Mode = paradigm

Page 90: Dmdh winter 2015 session #2

Just as the feet contain patterns, the

lines contain patterns that can be analyzed as well.

Page 91: Dmdh winter 2015 session #2

Still more computing tasks!•Report on types of deviations arranged by most to least common

•Information should include location (line/foot number), as well as prevalence of substitution type

Page 92: Dmdh winter 2015 session #2

Deviations and their placement within each line and each poem should display certain patterns

unique to each author (I hope!)

Page 93: Dmdh winter 2015 session #2

Current status: I’m investigating using the Natural Language Toolkit to tokenize each foot; and to

establish syllables, feet, and lines as a unique hierarchy.

Page 94: Dmdh winter 2015 session #2

Applicable Values

•Iterative development

•Failure as valuable

•Collaboration

Page 95: Dmdh winter 2015 session #2

If you are thinking about your data, and the tasks that you need to accomplish, then it’s easier to determine what sort

of language or platform your project

needs.

Page 96: Dmdh winter 2015 session #2

There are countless tutorials, online courses, etc., for

almost any programming language or platform.

(We’re giving you a cheat sheet, too; and http://www.dmdh.org is

your friend. So is Google.)

Page 97: Dmdh winter 2015 session #2

Learning them can be a slow process,

especially at first.

Page 98: Dmdh winter 2015 session #2

However, knowing what tasks you’re working towards makes it

easier to understand the purpose of the

introductory lessons.

Page 99: Dmdh winter 2015 session #2

It’s also easy to think about how the first rules you learn for any language or platform might affect

your goals.

Page 100: Dmdh winter 2015 session #2

And now, it’s your turn...

Page 101: Dmdh winter 2015 session #2

For this activity, we recommend that you pair up, or form

small groups to work together.

Page 102: Dmdh winter 2015 session #2

Group Activity•What do you need to do with your data?

•What units might that data exist in?

•What categories do you need to create?

•What relationships need to exist between the units and categories?

Page 103: Dmdh winter 2015 session #2

Upcoming Workshops!

•Crash Course on R: Feb 4, 12:30-2:00 (location TBD)

•Spring Workshops on Project Ideation and Development: April 11th and April 25th

Page 104: Dmdh winter 2015 session #2

DMDH content is developed by Paige Morgan, Sarah Kremen-Hicks, and Brian Gutierrez, with generous support from the Simpson Center

for the Humanities at the University of Washington.

Content is available under a Creative Commons Attribution-NonCommercial 3.0 Unported

License.

Please contact Sarah at [email protected] with questions.