when humanities scale - kristoffer l. nielbo12/36 big data or (just) data { depending on de nition,...

33
1/36 When Humanities Scale On the Emergence of Analytics in Culture Research Kristoffer L Nielbo [email protected] knielbo.github.io/ March 16, 2018

Upload: others

Post on 17-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

1/36

When Humanities ScaleOn the Emergence of Analytics in Culture Research

Kristoffer L [email protected]

knielbo.github.io/

March 16, 2018

Page 2: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

2/36

SENSE OF URGENCY

“we are seeing a new wave of DH-related (and data science) investments acrossScandinavia”

ANALYTICS FALLACY

“showing that an algorithm achieved state of the art results” instead of “rig-orous investigation of why we think a given method gives relevant results overa data set”

SHORT MEMORY

“historyless triumphalism that originates in the newness of the field and ismaintained by the digitization, data and eScience hype”

Page 3: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

4/36

PROGRAM

0.00 HUMANITIES & DATA a few data points are not enough

0.20 CULTURE ANALYTICS an emerging field

0.30 APPLICATIONS* humanities data and computing

0.55 SUMMARY ...

* interrupted by short digressions

Page 4: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

5/36

HUMANITIES & DATA

Page 5: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

6/36

– domain knowledge in history, language, literature &c combined with microscopic and(predominantly) qualitative analysis of human cultural manifestations

anti-thesis to data-intensive research– research that solely relies on very few data points, a “myopic” perspective andhuman computation

Page 6: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

8/36

– the data deluge is transforming knowledge discovery and understanding in everydomain of human inquiry

– knowledge discovery depends critically on advanced computing capabilities

a large part of these data are unstructured and fundamentally cultural

– to get additional value from these data, faculties of humanities must becomecomputationally and data literate

Page 7: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

9/36

– number of research publications alone makes computational literacy a necessity forthe humanities scholar

– publications related to Gospel of Marc (KJV) > 50K, ∼ 16,500 words in 16 chp. on 11 p.

– plus a massive increase in digitized cultural heritage databases (libraries, archieves,museums)

Page 8: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

10/36

Page 9: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

11/36

Page 10: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

12/36

Big Data or (just) data

– Depending on definition, most humanities data are not Big Data– they are however “big enough for us”

“Instead of focusing on a ‘big data revolution,’ perhaps it is time we were focusedon an ‘all data revolution,’ where we recognize that the critical change in theworld has been innovative analytics, using data from all traditional and newsources, and providing a deeper, clearer understanding of our world.”

(Lazer,Kennedy, King & Vespignani 2014)

Page 11: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

13/36

Archaeology|3D modeling

– humanistic domain experts (archaeologist) that use research technique (excavation)– digital technologies have increased the scale and changed the research area

Page 12: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

15/36

- archaeology and interaction studies currently use Big Data/HPC when thecomputational needs are present

– scale alone does not necessarily change methods or perspective– reduce ++data points to a few by relying on our myopic perspective for analysis

– we essentially lack a culture of analytics in the humanities

Page 13: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

16/36

CULTURE ANALYTICS

Page 14: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

17/36

In the humanities a culture of analytics is an analytics of culture

We have to study “the dynamics of culturally informed interactions between people,and the cultural expressive forms that result from these interaction ... at scaleshitherto unimaginable”

So we need to develop an “intellectually and ethically sound approach to the study ofcultures across time and across space, leveraging the enormous gains made in the pastdecade in computation and machine readable cultural archives, from libraries andmuseum collections to the born digital cultural expressions of billions of people on theinternet”1

1From the Culture Analytics White Papers’ Introduction

Page 15: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

18/36

– Culture Analytics seeks to understand cultural phenomena as inherently multi-scaleand multi-resolution– preference for micro to macro-movement (“scale from one object”)

Page 16: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

19/36

CULTURE ANALYTICS

In comparison to analytics proper- descriptive not predictive- neither side of the interdisciplinary divide is conceptualized as service- preference for micro-scale analysis- predominantly unstructured data- low-resource varieties/historical perspective (cultural heritage data)- reliance on qualitative assesment (e.g., hyper-parameters and validation

procedures)

... to similar trends (e.g., culturomics, cliodynamics)- multi-scale/multi-resolution- data-intensive ethos (scalability matters)

Page 17: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

20/36

APPLICATIONS

Page 18: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

21/36

Culture Analytics|Fractal Properties of Lexical Complexity

Page 19: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

22/36

Philosophy|Latent Semantic Variables

– philosphers and sinologists have been debating the existence of mind-body dualismin classical Chinese philosophy

– with domain experts, unsupervised learning was used to identify a multi-leveldualistic semantic space

– one model (LDA) was further utilized to predict class of origin for controversial textsslices

Page 20: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

23/36

History|Predictive Causality & Slow Decay

– historians and media researchers theorize about the causal dependencies betweenpublic discourse and advertisement

– time series analysis of keyword frequencies (from seedlists) indicated that for somecategories ‘ads shape society’, while other categories merely ‘reflect’

– advertisements show a faster decay (on-off intermittant behavior) than publicdiscourse (long-range dependencies)

Page 21: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

24/36

digression #1.1

Computational Literacy|Programming & Stats

– every knowledge intensive organization has to break the learning curve, but certainsectors are more challenged

– we out-sourced the task to an international non-profit organization w. years ofexperience in scientific computing

– promote a common language and import best practice from software development

– unix shell, python and version control

Page 22: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

25/36

digression #1.2

Computational Literacy|Programming & Stats

GUI → CLI

- novice-friendly visual approach to computer interaction w. a fast learning curve ERROR

- expert-friendly text-based approach to computer interaction w. ++freedom VALID

- CONFLICT break the learning curve through training intensive, non-intuitive, andspecialized tools

- locally, we try to solve this conflict with a mix of science and guerrilla warfare byestablishing small, semi-autonomous eScience units that intervene in humanitiesresearch

Page 23: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

26/36

Medieval History|Novelty Detection

– historians debate historical transitions

– Saxo’s Gesta Danorum c. 1200 AD.history of the Danish royal dynasty

– transition between book 8 or 9?

– transition point or gradual?

– traditional word-level representationis ambivalent

– latent semantic model was trainedover sentence windows

– change detection and recurrence plotused to identify phase transition focusdin book 9

Page 24: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

27/36

Media Studies|Novelty Detection

– change point detection in topicality space applies to “a change in the media tone”

– train model on 200 years of newspapers in a comparative study between DK and NL

– collaboration between historians, media studies and information science with apredictive scope

Page 25: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

28/36

digression #2

Copyright & Privacy|Data Access and Mobility

Challenges to computationally empowering humanities:

- technical competencies

- interdisciplinary respect and understanding

- epistemology differences

- data access and mobility

Data silos (the true punishment for the fall of man) often originate in“culturaldifferences”, not technical or legislative issues

copyright is a bigger challenge than data protection laws

Page 26: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

29/36

Literary History|Lexical Density

Literary scholars and creativity researchers argue for the “tortured artist”– “writers’ creative state is inversely related to their emotional state”

– “writers’ creative state depends on their emotional state”

– look for dependencies in lexical density and sentiment scores for highly profilicwriters to identify state incongruences

Page 27: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

30/36

digression #3

Historical Languages|Low-resource Varieties

– text analytics depends critically on existing tools and data (ex. sentimentdictionaries)

– orthographic variation in historical data represents a challenge, because NLP andTM resources “suffer from presentism”

– projects often try to adapt the tool (ex. modify dictionary to historical data set)

– this solution scales badly due to lack of standardization

For Scandinavian languages we use spelling correction (rule-based andprobabilistic) to normalize (or modernize) historical data increasing recallconsiderably

Page 28: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

31/36

Literary Studies|Sentiment Analysis

0 1000 2000 3000 4000 5000 6000−5

0

5

10

Time

Sentim

ent

Madame Bovary

(a) Original t = L/400 t = L/10

0 1000 2000 3000 4000 5000 6000−1

−0.5

0

0.5

1

Time

Sen

tim

ent

(b)

filtered (t = L/10)

filtered (t = 3L/8)

0 2 4 6 8 10 12−2

0

2

4

6

Hs=0.57

Hl=0.74

log2w

log

2F

(w)

– dictionary-based sentiment analysis can reconstruct narrative/plot vectors thatreflect human reading

– basic insights from structural linguistics and narratology can be captured by thisapproach

– a particular scaling-range, 0.6 < H ≤ 0.8, seems to indicate literary optimality

Page 29: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

32/36

Literary History|Sequence Alignment

– there is a new biographical trend is In literary history

– using lexical density and sequence alignment, we can compare creative trajectories ofauthors

Page 30: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

33/36

Anthropology|Language Modeling

– anthropologists discuss why rituals appear rigid, while they seem to maintainbehavioral variability

– manual annotation of ritual dance applied to ethnographic video archieves frommultiple generations

– very few behavioral units are transmitted between generations (compulsory),allowing for both flexibility and rigidity

Page 31: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

34/36

SUMMARY

Page 32: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

35/36

Summary

All knowledge-intensive organization are experiencing the data deluge

- demands new forms of expertise and (strange) bed-fellows

- unique situation where compute and data can empower humanities domainexperts and change our scale and perspective

- humanities are part of the solution

BUT,– scaling (Big Data) alone is not enough (e.g., archaeology, interaction studies)– we need a culture of analytics

CULTURE ANALYTICS– cultural behavior and products at scale– descriptive, transdisciplinary, historical, qualitative– challenged by lack of training, data access, low-resource varieties

Page 33: When Humanities Scale - Kristoffer L. Nielbo12/36 Big Data or (just) data { Depending on de nition, most humanities data are not Big Data { they are however \big enough for us" \Instead

36/36

THANK YOU

[email protected]

knielbo.github.io

& credits toMax R. Echardt and Katrine F. Baunvig, datakube, University of Southern Denmark, DK

Jianbo Gao and Bin Liu, Institute of Complexity Science and Big Data, Guangxi University, CHNCulture Analytics @ Institute of Pure and Applied Mathematics, UCLA, US