getting intimate with your data - working our way out of the lab
TRANSCRIPT
Analysing deuxWill the scholarship ever leave the lab?
Getting Intimate with Your Data !
!
18 February 2014
Any Success with TA or DV?Did anyone get a chance to poke around with
Voyant, TAPoR or ManyEyes?
An Interesting TA Case Study‣ Objective: Goal was to reveal the connection between
business and society in the historical record of the HBR ‣ Clement Levallois and Valerie Alloix
!
‣ https://www.kaggle.com/c/harvard-business-review-vision-statement-prospect/prospector#100
A Sample Text/Network Analysis‣ Merging the singular and plural forms of terms ("lemmatization"); ‣ Removal of the most common terms from the English language (based
on a list of 5000 frequent terms - stop list); ‣ Detection of terms composed of multiple words ("n-gram detection"); ‣ Identification of the 10 most frequent terms for each year; ‣ Publishing frequency equalised as years preceding 2000 were
grouped in 5 year periods; ‣ The next step was to manually inspect these 10 most frequent terms
for each year or group of 5 years. ‣ Result:
Clement's Levallois Cowo software (https://github.com/seinecle)
How was: !
"Dennis the Paywall Menace Stalks the Archives"
Dennis the Paywall Menace Stalks the Archives"I suppose I would wish D. C. Thomsonwell in moving on from Dennis the Menace to history, if it wasn’t for the fact that it involves the theft of publiccultural property." - Andrew Prescott
!
Access versus Preservation Access versus Process Privileging certain collection because they are available
"It seems as if archivists have been gripped by a mania to digitise as quickly as possibly, regardless of the
implications for future scholarship of how this is done." !
"Scottish students in Glasgow now study Welsh wills (freely available) rather than Scottish wills (locked behind a
brightsolid paywell) – a lesson for the Scottish government to ponder there, surely."
!
Andrew Prescott
"Digitization makes the most traditional forms of humanistic scholarship more necessary, not less.
But the differences mean that we need to reinvent, not reaffirm, the way we engage with the humanities."
"Process raw data received through our senses into concepts, patterns and implications. Everything coming in through our senses is information waiting to be processed
and understood." !
Wm Jones - Keeping Found Things Found
UnBuilding Grand Central Station
Data Consisting of What?‣ Basic types of content that we are used to deal with: ‣ Text ‣ Numbers ‣ Images ‣ Video ‣ Other, more “complex” stuff: ‣ Temporal - Time - Events ‣ Spatial - Space Coordinates - Place ‣ Relations, connections, links - genealogy - Networks
Time‣ Timeflow: ‣ Journalists ‣ TimeFlow was created by: ‣ Fernanda Viegas and
Martin Wattenberg(Flowing Media, Inc.) and
‣ Sarah Cohen (Duke University). ‣ The initial development was ‣ sponsored by ‣ Duke University's DeWitt Wallace ‣ Center for Media and Democracy.
Space and Place
Network Analysis
Networks and Relationships
Thinking Longer Term
for Next Lecure (25 February): Presenting I Please take a look at:
!
The Visual Complexity Website http://visualcomplexity.com
Thank [email protected] @iridium