digital humanities dh2012users.ox.ac.uk/~bodl0153/dh2012.doc · web viewdifferent types of...

Digital Humanities 2012 (DH 2012) conference reportDigital Humanities is the annual international conference of the Alliance of Digital Humanities Organizations (ADHO). ADHO is an umbrella organization whose goals are to promote and support digital research and teaching across arts and humanities disciplines, drawing together humanists engaged in digital and computer-assisted research, teaching, creation, dissemination, and beyond, in all areas reflected by its diverse membership. The 2012 conference took was hosted by the University of Hamburg from 16th to 20th July, the conference website is at <http://www.dh2012.uni-hamburg.de/>.

Day 1

Opening plenary“Dynamics and Diversity: Exploring European and Transnational Perspectives on Digital Humanities Research Infrastructures”, Claudine Moulin, Trier Centre for Digital Humanities, University of Trier, Germany

Drawing on her involvement in the European Science Foundation and the Trier Centre for Digital Humanities, the presenter offered her perspectives on digital humanities research infrastructures under four global headings: 1.setting a multi-faceted frame: DH and diversity of research; 2. fostering the diversity of languages, methods and interdisciplinary approaches; 3. one more turn: changing research evaluation and publication cultures; 4. interaction of digitality and DH in works of arts. Heading 1.: The ESF has formed a standing committee for the humanities (SCH) and has recently published an ESF science policy briefing on research infrastructures in the digital humanities. This publication was addressed to researchers as well as funding bodies, policy makers, and key stakeholders. The report identifies key needs of and challenges for practitioners in the field. A divergence from the natural sciences research environment is noted. Humanities researchers have long used research infrastructures (RI), starting with the Museion in the 3rd century BC, the first known information centre. Since then, museums, libraries and archives have continued the task of providing such an infrastructure for researchers. RIs need to encompass both physical and intellectual networks in humanities research, there are four layers of RIs: physical infrastructure (collections etc.), digital data infrastructures (repositories), e-infrastructures (networks, computing facilities), and meta-infrastructures, which aggregate independent RIs with different data formats. On a macro level, access to data, services, expertise, and facilities is required. RIs therefore require a multifaceted and multidimensional approach: a set of concurrent criteria for defining them in the humanities has been drawn up, aligned along the axes of the nature of the objects, collections, and level of data processing. This ecosystem can be applied to both global and local levels. Dariah, Clarin, TextGrid, TEI, bamboo can all be counted as important digital initiatives that constitute RIs. Multilinguality is a real challenge for this mainly English-speaking and -publishing community, the

1

field needs to make sure that it can reflect and accommodate these linguistic challenges. Heading 2.: Europe’s cultural and linguistic diversity is an opportunity, not a defect and should be recognized as such. Digital RIs have been developed earlier in the sciences, where the objects of study are less culturally bound than in the humanities. Humanities researchers tend towards qualitative methodologies, and taxonomies and ontologies have abounded in the humanities due to these complexities. An example is the Trier European Linguistic Network, a collaborative effort that links important national historical dictionaries, and which is now expanding to European dictionaries. Funding agencies play an important role in ensuring the digitization of the European cultural heritage, particularly language resources across the board. Heading 3: digital research is still undervalued in research evaluation programmes. The dividing lines in the traditional humanities have been broken down in the DH, which has led to some insecurities in the evaluation of research outputs. Engagement in the community is a prerequisite for understanding the contributions made by the field. The same applies to the publication culture in this area, the ESF has published a report on this problem. A culture of recognition needs to be instilled that understands the process-oriented character of a DH project, appreciates new formats of publication, such as databases, Web sites etc. Development of appropriate instruments for the evaluation of these outputs, a comprehensive clearing mechanisms (peer review), and fostering of interdisciplinary tools and teams, as well as credit and career perspectives for a new generation of young researchers are crucial. Heading 4: Digitality, the condition of being digital, has become a focus of interest for artists who are taking an active interest in the digital humanities, artists are creating excellent visualizations, e.g Ecke Bonk’s installation and visualization of the Grimm Dictionary, which displays the full richness and complexity of the work. Another example of digitality in practice is the iFormations art exhibition, originally from British Columbia, which has been brought to Hamburg on the occasion of dh2012.

Day 2

Session 1“Code-Generation Techniques for XML Collections Interoperability”, Stephen Ramsay and Brian Pytlik-Zillig, University of Nebraska-Lincoln, USA

Code generation is a mode of software development that is designed to create adaptable and quick solutions to changing requirements based on varying source documents. The key problem is text collection interoperability. While TEI is a widely-adopted standard, it cannot guarantee interoperability between collections. TEI does succeed quite well to allow for interchange of encoded texts, but is far from solving interoperability issues when combining collections: complexity and interoperability work in different directions. Even within the Text Creation Partnership corpora, there are interoperability issues. Without interoperability, however, we end up with silos once again. Perl, sed, and the bash shell are often methods to tweak things the right way. A more stable solution to the problem is code generation, the generated target schema usually contains everything we need to make collections interoperable. XSLT is one possible choice for code generation, as it is

2

almost a meta language that can also be useful for documentation purposes. The tool that has been developed for the purpose of interoperability is called Abbot Text Interoperability Tool (on github.com) and addresses many of the outlined issues and ensures interoperability among text collections. The language Clojure was used for the purpose of wrapping XSLT code to make it perform well on an HPC environment. The system scales well and works with large amounts of XML documents.

“DiaView: Visualise Cultural Change in Diachronic Corpora”, David Beavan, UCL, UK

The presenter introduced DiaView, a tool to investigate and visualize word usage in diachronic corpora. Starting point: the Google Books/ngram viewer is great when you know what you are looking for, but if you don’t, it’s tricky to start out. Google Books OCR quality is poor, the corpus does not evenly sample across genres, the chronological placements are questionable, and it is a very large corpus. DiaView is intended to address these issues. For demonstration purposes, the tools has been used with the English One Million corpus on Google Books, dated 1850 to the present, which still contains over 98 billion tokens, so very infrequently used words were also filtered out. Statistical analysis of word frequency distribution across the entire corpus is then compared to its frequency by publication year, which highlights any distributions that are skewed or focussed on a particular chronological range. DiaView is meant to be easy to use, can aggregate and summarize data, promote browsing and opportunistic discovery, help discover cultural trends, highlight interesting terms, provide links to more in-depth analysis, inspect corpus by decade, and provide an ability to work with any corpora or any dataset. DiaView does not rely on word frequency, it relies on calculating salience, it applies visual styles, and creates links back to the ngram viewer for in-depth analysis of discovered phenomena. Essentially, DiaView looks for interesting peaks in the data that help with the initial task of discovering noteworthy phenomena in the corpus.

“The Programming Historian 2: A Participatory Textbook”, Adam H. Crymble, Network in Canadian History & Environment, Canada

The Programming Historian 2 (PH2), is an open access methodology textbook, targeted at ordinary working humanists interested in programming. It is intended to be instantly useful for the work they do. The first iteration of the textbook was targeted at historians alone and focussed on Python as a programming language. It was successful, but the second version is now much more widely targeted, new lessons were commissioned from members of the DH community. PH2 is not really targeted at academics, it helps with visualizations and using simple but very powerful command line tools. PH2 employs rigorous testing and peer-review to ensure that users can use resources with confidence. The presenter invited the audience to contribute their expertise to the textbook, authors do get proper attribution and make a valuable contribution to digital scholarship. Ideas include: R, Ruby, Python, tools, metadata, XML, textual and image analysis. A good lesson is structured to offer good

3

success and a working example by the end of a 60 minute lesson. Lessons have an obvious goal: short time, quick success, for a novice but intelligent audience. Lessons can also be conducted in a classroom setting, with assignments and group discussions. Online at <http://programminghistorian.org/>.

“ XML-Print: an Ergonomic Typesetting System for Complex Text Structures”, Martin Sievers, Trier Centre for Digital Humanities (Kompetenzzentrum), Germany

This DFG-funded project has developed its own XML-FO engine to facilitate typesetting of XML documents that meets the requirements of professional typesetting conventions. The typesetting system takes XML as a source document, and also works with semantically annotated data, it provides a modern graphical interface, rule-based formatting, etc. Its output is an XSLT-FO stylesheet that is used to generate PDFs. The tool is currently a stand-alone application, but ultimately it will be a web service and will also be integrated in the TextGrid tool box. With the style editor it is possible to apply layout information (or formats) to the XML source, with detailed formatting options, similar to the options in a word processing application. The style editor highlights the active transformation mappings, but also offers alternative formats, which can be explored very easily. Mappings can be based on the elements in the XML tree and can be narrowed by attributes and positions within the tree. The style editor has a modern user interface, the style engine is based on standard XML technologies and Unicode. XML-Print addressed key requirements facing scholars publishing their data today.

“The potential of using crowd-sourced data to re-explore the demography of Victorian Britain”, Oliver William Duke-Williams, University of Leeds, UK

The presenter introduced the FreeCEN project, an effort to explore the potential of a set of crowd-sourced data based on the returns from decennial censuses in nineteenth century Britain, starting in 1801. The census questions asked then were very different from ones asked today, the census of 1841saw the transition to a household-based census, and from 1851 there were increasingly mature administrative structures in place. From 1891 onwards more focus was put on employment and tracking class structures. Census information was aggregated by statisticians from enumerator’s books kept in the households, and reports to parliament were produced from these. From 1841 onwards, the data is held at the National Archives. The FreeCEN project aims to open up this data from publicly available sources by a volunteer effort to transcribe them from the original sources. Much of this data is currently only available through for-fee commercial services. The sample data used was data from Norfolk, which comprises 40,000 samples, 4,500 occupations, 199 different ‘relations to head of family’. Processing steps include automatic data clean-up, e.g. normalizations (sex, age, county of birth, parish), but quality and comprehensiveness of the data will vary by area and transcriber as data is created by an existing volunteer-based effort. The sample also looks at

4

lifetime migrations, which were used by Ravenstein for his 1885 book Laws of Migration. A particular challenge has been the encoding of places places and will need more work. The interesting question is: can we produce new interesting graphs in their historical contexts that address questions not previously asked. Results will be published on the ESRC census programme <http://www.census.ac.uk/> and will be made publicly available.

Session 2“Engaging the Museum Space: Mobilising Visitor Engagement with Digital Content Creation”, Claire Ross et al., UCL, UK

Public engagement is crucial for museums. It is a vital component of teaching, learning and research, which are the three key tasks that museums perform. Digital public engagement is a new aspect of this task. There has been a massive explosion of digital content in museums and mobile applications in particular are an excellent way of involving the public digitally. Some key issues remain: how do digital resources improve the museum experience most? This paper reports on a case study Qrator, a collaborative project developing new content, co-curated by members of the public and academics. The collection used is “Dead Space”, a large natural history collection. QR tags are displayed prominently throughout the collection and visitors can contribute their own materials, iPads are available in the museum, but users can also download the mobile app and use it on their own smart phones and digital devices. The app is based on the Oxfam shelflife app that lets you attach a story to your donated items. QRator is recording responses to items and also to a number of provocative questions that were associated with the objects. Responses were categorized, they include comments on the museum, comments on the topic/question, and noise which was not too bad. The system is unmoderated, but filters out some profanity, the general impression has been that the public will respond well to engaging questions and context for the objects on display. Even one word responses to the objects are sometimes illuminating. Further investigation reveals that the majority of the comments on topics express opinions, general comments and specific comments on the object. Museums are opening up their collections to the public more and this is crucial for public engagement, the digital humanities as a discipline can learn a lot from this development. In future it will also be possible to take photos and add captions, and even short videos, to individual objects.

“Enriching Digital Libraries Contents with SemLib Semantic Annotation System”, Marco Grassi et al., Università Politecnica delle Marche, Italy

The digital evolution has made huge amounts of digital data available, and increasingly that data is semantically enriched and published on the Web for re-use and further annotation. Annotation of Web content is a very useful activity, what’s missing are clear semantic annotations to allow us to unambiguously tag items, which can improve the digital library experience for users. Users should be

5

empowered to create knowledge graphs that rely on controlled vocabularies and ontologies, and link out to the Web of Data for additional enrichment. Pundit is the name of the tool created for this purpose, a novel semantic annotation tool developed as part of the EU-funded SemLib project. Pundit relies on the Open Annotation Collaboration (OAC) ontology, which allows for wide interoperability of annotations. Named Graphs are employed to capture the annotation content, which makes it possible to query just slices of information using SPARQL. Annotations are collected in so-called notebooks, which can be shared as URLs on the Web. They can also be shared on social media sites, but individual access restrictions can be applied as well. The authentication mechanism is based on OpenID. The hoped-for result is a huge knowledge base surrounding objects in a Digital Library environment. Named Content is another feature of Pundit, specific markup can be added to identify atomic pieces of information, so that the same piece of information can appear in a variety of contexts. Pundit is a RESTful Web service, based on CORS and JSON. Pundit can annotate all sorts of content, text, images, audio, and video, SemTube is an experimental implementation of annotation of YouTube videos. Pundit also allows for custom vocabularies/ontologies to incorporate content from wider areas and different domains. Pundit is available at <thepund.it>.

“Digital Humanities in the Classroom: Introducing a New Editing Platform for Source Documents in Classics”, Marie-Claire Beaulieu, Tufts University, USA

The presenter introduced a new online teaching and research platform that enables students to collaboratively transcribe, edit, and translate Latin manuscripts and Greek inscriptions. The platform needs to allow for the editing and study of large amounts of source documents and make documents widely available. The pedagogical needs include hands-on learning, inclusion of non-canonical texts, a collaborative learning environment, and ability to form part of a student’s online portfolio (undergraduate and postgraduate students). The project was piloted in 2012, editing and translating the Tisch Miscellany, but the workflow with Word documents and PDFs was unsatisfactory. The new system has been tested with a set of uncatalogued medieval MSS and early printed books. Other sources can be used from Creative Commons sources. A special concern has been the visual representation: these texts need to be approached as physical objects as well as texts. Platform adopted the Son of Suda Online (SoSOL) software for editing, and CITE services (‘Collections, Indexes, and Texts, with Extensions’) developed by the Homer Multitext Project for image citation. SoSOL was developed by the papyri.info team, it supports collaborative editing, and the TEI-based EpiDoc XML standard. CITE services link the resources in the platform and offer URNs to identify the citation as well as the association between the image to be transcribed and the XML transcription. All texts are archived in the Perseus Digital Library and the Tufts institutional repository. The inscriptions will be available on Perseus as a new collection. A version history is also kept. Limitations include the plugin technology used, but this will be addressed by a move to jQuery in the future. Integration of SoSOL and CITE has been technically challenging and interoperability hasn’t been easy. In conclusion, the project demonstrated an interesting integration of teaching and research, an introduction of DH methodologies into the classroom, creation of publicly available

6

tools, and leveraged expertise from a range of Tufts university units. It has been an interesting virtual lab setting experiment.

Session 3“Myopia: A Visualization Tool in Support of Close Reading”, Laura Mandell, Texas A&M University, USA

This paper describes a collaboration between an interdisciplinary group of researchers. Myopia is a close-reading tool to analyse poetic structures and diction. The texts that have been used as a basis are from the Poetess Archive. There have been visualizations of the Poetess Archive content before, the aim of this tool is to benefit from the time-consuming activity of close-reading by feeding the resulting markup into a tool that seeks to amplify our understanding and facilitates discovery of new knowledge. Myopia is a desktop tool built in Python, that visualizes poetic structures, analyses metre and tropographical encoding. On mouseover, the software identifies each metrical foot and displays its characteristics as part of a line of verse or stanza. This has only been done in detail for one poem: Keats' “Ode on a Grecian Urn”: metre encoding, trope encoding, sound encoding, and syntax encoding have been produced as TEI P5-conformant stand-off markup. The tool allows for a layered display of these various textual features, and also for the text to be hidden, which is useful for comparing poems. Also stresses are visualized through flashing text. Originally the encodings were all in one TEI document, but this turned out to be too complex, they are now separated out into four XML documents. Overlapping hierarchies are a major obstacle in many of these XML encodings. Close reading can be a very satisfying endeavour in its restraint that allows for narrowing down meaning to significance (as Stanley Fish demanded in his criticism of the digital humanities), but the interesting part is that authorial intentionality is unknowable, and sometimes is “made” by critics and readers. Finding the critical means to determine this intentionality is where computer-assisted close reading needs to go. Discovering patters is clearly a more important task than focussing too narrowly on features that are pronounced unique, but are unconvincing on closer inspection.

“Patchworks and Field-Boundaries: Visualizing the History of English”, Marc Alexander, University of Glasgow, UK

This paper uses the database of the Historical Thesaurus of English developed over 44 years and published in 2009 to visualize change in the history of English, and in particular in the English lexicon. The corpus exists in both print and as a database at the University of Glasgow. The database is a massive computational resource for analyzing the recorded words of English with regards to both their meaning and dates of use. By combining visualization techniques with the high-quality humanities data provided in the HT, it is possible to give scholars a long-range view of change in the history, culture and experiences of the English-speaking peoples as seen through their language. A

7

tree-mapping algorithm was used to visualize the database. Colour has been used to highlight OE (dark), and newer uses of a word (light yellows). The visualization of the coloured map of English words shows that there are patterns to be discovered inside of particular linguistic areas and key stages, such as in Old English, Middle English, and Modern English. This approach can also be extended to mappings e.g. of metaphors and how they develop over a period of time or certain category growth over a period of time. Categories such as electromagnetism and money, newspapers, geology, chemistry grow in the 18th century, but others such as faith, moral, courage decline. Salience measures in a separate corpus of Thomas Jefferson’s works against a reference corpus shows surprising examples, categories such as conduct, greatness, order, and leisure are above average in his writings, but lots of things one would expect are missing such as farming and agriculture. It will be interesting to see if the database will allow for similar investigations into entymology. It is hoped that the data will be licensed by OED to allow free use for academic purposes.

“The Differentiation of Genres in Eighteenth- and Nineteenth-Century English Literature”, Ted Underwood, University of Illinois, Urbana-Champaign, USA

This is a collaborative project from the University of Illinois. The general strategy is corpus comparison, a very simple approach, but it is hard to interpret phenomena, and inherent significance is not easily identifiable. It is easier to interpret comparisons that are spread out over a time axis. It removes the argument of an “ahistorical” significance and lets us focus on specific phenomena in specific periods that can be more easily tried and tested. Two categories, prose and poetry, drama and fiction have initially been compared, e.g. the yearly ratio of words that entered the English language between 1150 and 1699 in prose and poetry. We see that the curve is steadily going up in poetry, but is flattening in prose. The text corpus of 4,275 volumes was drawn from ECCO-TCP, Black Women Writers, and Internet Archive. OCR was corrected, notes were stripped, focus has been on the top 10,000 words in the collection, stopwords have been excluded. By narrowing the analysis, we realize that the poetry corpus ranks highest by far in this comparison, followed by prose fiction, while the curve for non-fiction prose flattens dramatically. Drama follows the same pattern, it is much higher than non-fiction, on a similar level than prose fiction. We need to ask ourselves what we know about the genres involved? Genre is not as easily distinguishable in 1700 as it is now, we see that the differentiation of diction is running in parallel with the distinguishing of literary genres. Also, we need to ask what do we know about the entymological metric? E.g. use of Latin-origin words in the period between 1066 and 1250 is less than OE words, as English is still primarily a spoken language, later a more learned vocabulary develops for written English. In our interpretation we need to be careful not to overestimate the metric we chose for this analysis, we cannot really know if this is the right metric. We could try and mine other associations from the trend itself to gain further insights. Observing correlations of words over time is a good means of statistical exploration of a corpus, in poetry for example, we can see correlations of personal experiences, domestic, subjective, physical and natural. It also reveals a broad

8

transformation of diction, which is not necessarily as easily verifiable and this analysis offers a first significant means to exploring these trends.

Day 3

Session 1“Literary Wikis: Crowd-sourcing the Analysis and Annotation of Pynchon, Eco and Others”, Erik Ketzan, Institut für Deutsche Sprache, Germany

This presentation reported on the use of Wikis for the purpose of annotating complex literary texts collaboratively. The frequent unattributed references in Umberto Eco’s Mysterious Flame of Queen Loana were the starting point for the annotation project, particularly to annotate all the literary and cultural references to 1940s and 1950s Italy in the novel. This project gathered some 30-40 annotators who produced more than 400 entries. A second example is Thomas Pynchon's Against the Day (2006) which prompted the creation of <pynchonwiki.com>. It allowed for A-Z as well as page by page entries, had more than 400 contributors who produced an annotation corpus of over a million words. Speed was the main advantage of the Wiki over the book, within weeks major long novels were thoroughly annotated and referenced. As in Wikipedia, a hard core of users do most of the work. It was important to establish the expected quality of the entries at the start was, people then produced entries of a similar quality. Thus, community guidance while minimal proved crucial. Placing question marks behind entries was a great trick to solicit annotations and prompt people to contribute. Contributors take things personally, they are emotionally engaged and need to be treated appropriately. People liked the page-by-page annotations, as there is a sense of noticeable progress and a prospect of completion. E-book readers now do some of what literary wikis are about, and literary wikis are a great way to remove the sometimes artificial divide between academia and the public. Wikis are, however, not easily extensible to all authors, they work best with authors that already have a vibrant online community. No community really evolved around the Wikis, people came, edited and left.

“Social Network Analysis and Visualization in The Papers of Thomas Jefferson”, Lauren Frederica Klein, Georgia Institute of Technology, USA

The source materials of this project are The Papers of Thomas Jefferson, digital edition (Virginia). The project applies social network analysis to Jefferson’s paper in order to visualize the resources of a historical archive and to illuminate the relationships among people mentioned in the resource. The project is part of a larger project to apply natural language processing and topic modelling to the works of Thomas Jefferson. The sources are full of references to people, particularly in Jefferson's letters. Often these references are obscured by abbreviations, by the use of first names only, by terms of endearment, etc. In tracing these references, the project is hoping to overcome the

9

phenomenon of archival silence for many of the persons referenced in these writings, particularly regarding enslaved men and women. Arc diagrams can be produced to visualize the relationships of people in the correspondence, these can then be grouped into letters to family, political correspondents, friends in Virginia, enslaved staff, international correspondents etc. These visualizations let us view the monolithic corpus of Jefferson’s letters with new eyes, it highlights the main components of the correspondence, but also the omissions and silences. Digital Humanities techniques can really help us to address some of the issues associated with the discovery of connections, which are otherwise difficult to trace and establish.

“Texts in Motion - Rethinking Reader Annotations in Online Literary Texts”, Kurt E. Fendt and Ayse Gursay, MIT, USA

This project is a collaboration within the multidisciplinary HyperStudio unit at MIT <hyperstudio.mit.edu>, a research and teaching unit on digital humanities and comparative media studies. The key principles of the unit are based on the educational and research needs, and usually involve co-design with faculty, students, etc., in an agile development process with integrated feedback loop. Students in the unit are considered novice scholars, engagement of learners in the teaching and research processes is crucial. Need for this project arose out of the question, how can we make literary texts much more accessible, thus supporting students in understanding original sources with a focus on the process of analysing literary texts. The resulting product that addresses this need is Annotation Studio, a flexible graphical annotation and visualization tool. It is based on reader-response theory, identifying different types of readers, and considering readers as collaborators, meaning as a process, not a product. The tool is trying to make the private process of interaction with a text visible. Engaging students to become editors, and hence “writerly readers” is key to the process. Annotation is a powerful mechanism for engaging with texts as writers and readers, focussing the engagement on a close-reading level. The technologies used keep text and annotations separate and support flexible annotation formats. The tool is built on open source technologies, and implemented in JavaScript and Ruby on Rails. It will be open sourced at <hyperstudio.github.com>. Annotation Studio was funded by an NEH digital humanities start-up grant.

“Developing Transcultural Competence in the Study of World Literatures: Golden Age Literature Glossary Online (GALGO)”, Nuria Alonso Garcia and Alison Caplan, Providence College, USA

The focus of the presenters' engagement with the digital humanities is its pedagogical implications, particularly in foreign language classrooms. The main aim is to increase engagement of students with foreign language texts and remove some of the obstacles to this engagement. The presenters introduced an online searchable glossary, GALGO, of key words from the literature of the Spanish siglo d’oro. Based on keyword theory, the process involves defining the keyword, tracing the

10

keyword in context (identifying keywords in close proximity), and clustering the keyword (group of multiple keywords in categories focused on culture and society). Digital classroom applications need to be very focussed to achieve a real benefit in a teaching situation. GALGO is work-in-progress, its pedagogical potential will evolve and be improved over time. The glossary offers contextualized definitions as well as entries for all the meanings of a word context-indepentently. All texts will be hyperlinked to entries, so users can look up words in the original contexts from which the definition has been taken. The glossary therefore has an intra- as well as intertextual dimension, learners engage with classic texts and test their hypotheses regarding the polysemic value of Golden Age concepts. Learners engage with meanings in a contextual manner, which is more helpful than just basic word definitions. The tool has been conceived as a supplement to the texts in print used in the classroom, it is intended as a reference tool to support students in the process of paper writing and similar tasks in a classroom context.

Session 2“Aiding the Interpretation of Ancient Documents”, Henriette Roued-Cunliffe, Centre for the Study of Ancient Documents, University of Oxford, UK

Supported by the eSAD project and an AHRC-funded doctoral studentship, the presenter demonstrated how Decision Support Systems (DSS) can aid the interpretation of ancient documents. The research conducted centres on Greek and Latin documents, particularly the Vindolanda tablets, and is intended to support papyrologists, epigraphers, and palaeographers, and will potentially be useful to readers of old texts from other cultures. Decisions in the humanities are often interpretations, frequently subjective, often not well supported or quantifiable, and difficult to map as a structure. Computers can aid in this process, but are not making decisions. This is where DSS comes in: it supports, but does not replace decision-makers. Formalized decisions are possible in expert systems, but this is not what we are aiming for. DSS are situated somewhere in the middle between expert systems and individual readings. DSS allows for the recording of initial findings and provides an opportunity to annotate those readings and preliminary insights. The APPELLO word search web service was developed to search through XML-encoded texts, to match a pattern of characters and returns possible matches to that pattern. The DSS prototype is mainly a proof-of-concept. A case study was used for this purpose: DSS offers a structure that enables scholars to remember their decisions and thus aids them in their further investigation. The DSS application was built around the tasks that readers spend most time on, such as identifying characters, looking for word patterns, and contextual information on passages. The DSS lets scholars record their arguments rather than force them to make decisions. DSS can aid the reading of ancient documents by helping to record the complex reasoning behind each interpretation. A future DSS will most likely be layered, where individual layers can be turned on and off, e.g. original image, enhanced image, transcription, structure, meaning, interpretation. Developing the DSS as a component of a larger collaborative VRE would be an ideal approach.

11

“Reasoning about Genesis or The Mechanical Philologist”, Moritz Wissenbach et al., University of Würzburg, Germany

This presentation examines the decision-making processes involved in establishing the genesis of a literary work, the result of which is a genetic edition. The text in question is Goethe’s Faust. There is only one critical edition of the work in existence, which dates from 1887. A new edition is currently being prepared by a group of scholars. There are c. 500 archival units/MSS extant, but only very few of them are dated, dating these witnesses is a major task. It is a key task in establishing links and relations between the MSS, as is taking into consideration a century of published Faust scholarship. Dating MSS requires some methodological reflection, if no absolute chronology can be established, we must rely on a relative chonology. Evidence for dating can be explicit dates, material properties (paper, ink), external cues (mentions), and “logic”/dynamic of genesis inherent to the text. Computer-aided dating is a tool for editors that can provide the same tool and data to the users to follow the argument of the editor. The main task is to formalize the practice of dating using formal logic. Central notions/rules for dating are: “syntagmatic precedence”, where we hypothesize about the precedence of one version of a text over another, “paradigmatic containment”, where if a text is contained in another text, we assume that the contained text is earlier than the text containing the text, and “exclusive containment”, which says that if a text is exclusively contained by another, than the text containing the text is earlier then the included text. To these rule sets, we need to add the knowledge from research that has already been done. New research will also be added and then all these resulting graphs of relations can be combined, any inferred relations can be overridden by actually established ones. Inference rules are formalized hypotheses about an author's writing habits, the measures for these are: coverage, recall, and accuracy. Goethe's writing habits are far from linear, a formal logic approach for the purpose of rule prioritization seems to offer a promising qualitative analytical framework.

“On the dual nature of written texts and its implications for the encoding of genetic manuscripts”, Gerrit Brüning, Freies Deutsches Hochstift, Germany, and Katrin Henzel, Klassik Stiftung Weimar, Germany

Textuality is at the heart of many of the questions surrounding genetic encoding. The notion of written text is at the centre of these deliberations. The TEI was one of the earliest initiatives to capture textual features, but only marginally covered the genetic processes, these have only recently been incorporated in the latest version of the TEI P5 Guidelines. But these new elements conflict with the old text-oriented markup, a clarification of these relations needs to be reached to marry these different layers of textual encoding. One way of approaching this is to focus on the concept of “written text”, the materialized version of a text that is the result of the writing process. Written text has to be considered as a linguistic object that is not easily separated from its materiality. The physical object is an inscription on a material surface. The two dimensions of materiality and

12

textuality are not easily integrated and each must be given their proper place. Documentary vs textual encoding is the main issue, we need what is now in chapter 20 of the TEI, i.e. non-hierarchical markup. But genetic markup goes further than that, it appears that some of the clarifications need to be based on a revisiting of the basic notion of textuality and “written text”. Intermingling of a documentary and a textual perspective complicates both the recording of information on the writing process and the subsequent processing to generate e.g. diplomatic and reading text versions. For the purposes of genetic encoding, we need to differentiate between text positions and textual items, which might help to spot, clarify, and come up with a solution to these common conflicts.

Session 3“Violence and the Digital Humanities Text as Pharmakon”, Adam James Bradley, The University of Waterloo, Canada

This paper theorizes on the process of visualization as both a destructive and creative task, destructive as it replaces the studied object, creative as it shifts the aesthetic of the text for re-interpretation. The presenter has experimented with visualizations of eighteenth-century thought, particularly Diderot’s philosophy, which is based on the concept of nature: all matter acts and re-acts, when relationships are created, the notions of structure, form and function are also created. The accuracy of these suppositions is to be confirmed experimentally. In Diderot, enthusiasm is key to the process of perception that results in inspiration. The process of visualization goes through three steps according to Diderot: a pure visualization that is enticing inspiration, a defamiliarization of the supposedly known, and a creation process evolving from this. The artist in this way helps to abstract from the text to a three-dimensional model that allows for an exploration along different paths. Using a mathematical approach, the presenter introduced a new way of visualization that maps every word of a text into a three-dimensional space. The three-dimensional space is created by a base 26 number line that, according to mathematician Cantor, has an exact representation in a three-dimensional space. This location information means that any text can be rendered three-dimensionally, very large corpora can thus be visualized and explored in new and interesting ways. This type of visualization is non-destructive as it creates a one-to-one relationship between the visualization and the original text, which allows for an easier transition between the original text and the visualization.

“Recovering the Recovered Text: Diversity, Canon Building, and Digital Studies”, Amy Earhart, Texas A&M University, USA

Diversity of the literary canon in America has been a long-standing issue in literary criticism. Digitization and the Web was hailed as a democratizing medium that would widen the canon and incorporate women writing to a far higher degree than was previously possible. A lot of early

13

unfunded simple projects sprang up in the late nineties that have produced a wide range of “recovery web pages” that highlighted recovered literature by women and minorities. Many of these projects are already disappearing and this poses a real issue as the resources they presented are not easily retrievable or easily accessible. Many of these projects have been conducted by single researchers outside of a digital infrastructure and outside the digital humanities community. The disappearance of the projects is also due to the funding infrastructures that are not conducive to small-scale lesser known writers and texts, there is little impact factor for such endeavours. We need some basic preservation infrastructures to preserve these projects. Incorporation of such smaller projects into larger currently funded infrastructures would be a good possibility for preservation. Existing structures of support need to be supplemented by a social movement to raise awareness about these projects and exploit their resources to maintain a wider canon in the digital humanities world than we have today.

“Code sprints and Infrastructure”, Doug Reside, New York Public Library, USA

Small projects have sometimes achieved in a small way what large infrastructure projects have not yet achieved, namely to establish communities that develop commonly needed tools collaboratively. Methods such as rapid prototyping, gathering “scholar-programmers” and hackathons are designed to foster such communities. These working meetings, or “code sprints”, are designed to address concrete requirements of a particular community or group od scholars. But code sprints also have problems, such as a lack of code sharing mechanisms, coding languages and dialects, lack of focus and documentation, and differing assumptions about goals. Big infrastructure programmes follow a waterfall development (Plan|Do): requirements, design, implementation, verification, maintenance. Agile development is circular, by contrast, iterative and quick. Interedition is an example of a successful small product that was driven very much by a community that evolved around it. Organized in the form of boot camps, these meetings focused on transcription, annotation, and collation as primary scholarly tasks in producing (digital) scholarly editions that could effectively be supported by common models and tools. The Interedition boot camps have resulted in various new tools in the form of web services – of which CollateX is probably is best known – and considerable progress in development of existing tools (Juxta, eLaborate etc.). Small agile developments could be very useful in the digital humanities as a loosely organized community that is characterized by flexibility and enthusiasm. Microservices seem to be a good means of development in this community that can be developed from the bottom up and involve contributors from a variety of backgrounds, e.g. humanistic, technical, design/visualization experts. It is possible that large infrastructure projects such as Bamboo, Dariah, and TextGrid could provide the organizational and administrative infrastructure needed to make code sprints more effective and their work more sustainable.

14

Session 4“Developing the spatial humanities: Geo-spatial technologies as a platform for cross-disciplinary scholarship”, David Bodenhamer, The Polis Center at IUPUI, USA, et al.

Key themes of this panel were an effort to reconcile the epistemological frameworks of the humanities and GIS to locate common ground for cooperation, designing and framing spatial narratives about individual and collective human experience, building increasingly more complex maps of the visible and invisible aspects of place. What we are aiming for are “Deep Maps”, a dynamic virtual environment for analysing and experiencing the reciprocal influence of real and conceptual space on human culture and society: spatially and temporarily scaled; semantically and visually rich; open-ended, but not wide open; problem-focused; curated and integrative; immersive, contingent, discoverable; support of spatial narratives and spatial arguments through reflexive pathways; subversive.

“GIS and spatial history: Railways in space and time, 1850-1930”, Robert Schwartz, Mount Holyoke College, USA

GIS can contribute to the spatial humanities as demonstrated by this work on the expansion of railways in France in the 19th century. GIS is great at developing spatial and temporal patterns that can be queried inside of a specific cultural context and background. Contributions of GIS to humanities include: large scale comparisons across national borders, tracking multi-scalar change, combining multiple research domains, multiple kinds of sources and evidence, and spatio-temporal patterns. Its limitations include that it is a reductionist technology if left uncontextualized, thus narrowing your research questions, and has a tendency to facilitate simplicity and restrict complexity. In the context of the railways project, questions to be asked are not so much how far cities were apart, but what the human experience was to travel those distances, what the journey was like, whether it was common or unusual? The expansion of the railway system during the 19th century meant a transformation of both spatial and temporal proximity not only between places but also from people’s homes to the nearest railway station. What did this accessibility mean and why does it matter? From a humanistic point of view the changes in people’s perceptions about space and time are more important than all the socio-economic implications the expansion of the railway system brought about. GIS identifies patterns of spatial and temporal interconnectivity, provides geo-historical contexts for teasing out meanings and explanations. GIS is most useful as an auxiliary methodology in support of humanistic enquiry.

“Spatial Humanities: Texts, GIS, Places”, Ian Gregory, Lancaster University, UK

GIS is an integrating technology that can accommodate different methodological approaches from a number of disciplines. GIS has been used to trace infant mortality rates in Great Britain in the 19th century and map the areas that had the biggest decline in infant mortality and those areas which had the least progress in reducing infant mortality rates. This does not offer us any explanation as to what the reasons for this are, but this is not a problem of GIS itself but of the source data available

15

for the purpose. Instead we need to look at the literature of the time to find out what the reasons for certain phenomena are. Corpus linguistic techniques can help us out with collocation techniques to find associations between certain places and topics of investigation during certain periods. Mapping the Lakes is another example that will be integrated in this research, a comparison of the tours of the Lakes done by Gray and Coleridge in 1769 and 1802 respectively. The ERC-funded Spatial Humanities project wants to bridge the quantitative/qualitative divide, it also wants to build up the skills base, establishing a PhD studentship etc. GIS provides us with a new way of looking at texts, identify patterns and offers relations between these observations and patterns discovered.

“Mapping the City in Film: a geo-historical analysis”, Julia Hallam, University of Liverpool, UK

The year 1897 was the beginning of film-making in Liverpool and a previous project traced the development of the city in film. This work involved the use of maps from an early stage, particularly as a means of comparing urban developments in various parts of the city. This has led to the “Mapping the City in Film” project that aims to develop a GIS-based resource for archival research, critical spatial analysis, and interactive public engagement. The aim is to map the relationship between space in film and urban geographic space. GIS allows us to navigate the spatial histories attached to various landscapes in film, develop dialogic and interactive forms of spatio-cultural engagement with the local film heritage, facilitates public engagement in psycho-geographic narratives of memory and identity around film, space and place. The result of the project is an installation in the Museum of Liverpool of this integrated map. It offers anchors for attaching films to it and recording experiences of space in film. GIS has several advantages: layering the cine-spatial frame-by-frame data, informing multi-disciplinary perspectives and practices that contextualize historical geographies of film by mapping a virtual multi-layered space of representation: filmic, architectural, socio-economic, cultural, etc.

Day 4

Session 1“Intertextuality and Influence in the Age of Enlightenment: Sequence Alignment Applications for Humanities Research”, Glenn H. Roe, University of Oxford, UK

The presenter reported on the findings of a project conducted in the contexts of the ARTFL project, Chicago, and the Oxford E-Research Centre. Textual influences and intertextuality are important areas of literary study. Relations between texts are complicated and multi-faceted, it is a key element of humanisitic endeavour to trace these links, from direct quotes to “influences” and allusions. To examine the genetics of intertextuality, we are borrowing an idea from microbiology, namely sequence alignment: a general technique to identify regions of similiarity shared by two sequences, aka longest common substring problem. There are applications of this methodology in many domains. Key advantages of the approach are that it respects text order, does not require pre-

16

identified segments, can align similarity directly, not as block of texts, spans variations in similar passages (insertions, deletions, spelling, OCR and other errors, other variations). The result of this work is a software package called PhiloLine/PAIR, now an open-source module with the Philologic software. The software is based on n-grams, currently tri- and quad-grams, filtering is used as well as stemming. Parameters that can be set include span and gap, i.e. the minimum number of n-grams considered a match, and the maximum number of unmatched n-grams allowed within a match respectively. Results are stored in individual files sorted chronologically by year of document creation. Each link contains bibliographic information. The planned output is XML, possibly TEI reference indicators. As a use case Voltaire's Encyclopedie has been analysed for Voltaire’s presence in it. This has long been a problematic area, there are lots of quotations and references, but most of them unattributed, Voltaire has been described as “absently present” in the Encyclopedie. This has led critics to believe Voltaire was sceptical of the endeavour and the agenda of its editor, contributing only 45 articles. However, the tool used here has discovered well over 10,000 matching sequences between Voltaire's Complete Works and the Encyclopedie, thus demonstrating Voltaire's textual presence as an authority over and against his more restrained role as an encyclopedic author. Intertextuality thus avoids the problems of distant non-reading, lets you focus on relevant passages, identify passages that allow for the fruitful engagement of academics with the influences and intertextualities of a period.

“Trees of Texts - Models and methods for an updated theory of medieval text stemmatology”, Tara Lee Andrews, Katholieke Universiteit Leuven, Belgium

Stemmatology is the discipline of deducing the copying order of texts on the basis of relations between the surviving manuscript witnesses. There are different approaches, the most popular of which is the “method of Lachmann”, a 19th-century methodology, that has some significant drawbacks, which resulted in a new approach termed Neo-Lachmann. The latest approach employs phylogenetic methods, essentially sequencing of variations, but the issues of significance of any observed phenomena remain a major problem. Too much a-priori judgement, or no judgement at all are the two extremes. We need an empirical model for text variation. The goal of this research is to arrive at an empirical model for medieval text transmission, a formalized means of deducing copying relationships between texts. To this end we must evaluate a stemma hypothesis based on the variations in a text tradition, and we must evaluate variants in a tradition according to a given stemma hypothesis. When we model text variation programmatically, we end up with sequence graphs for the various witnesses of a text, which we can then use to map relationships. We also need to model witness relationships in the form of stemma graphs, but handling witness corrections is complex as soon as several witnesses are involved, as all possible relations of unclear/unverified corrections need to be encoded. Different types of variants in texts such as coincidental and reverted variants all need to be modelled in stemma graphs. The project has developed a tool that evaluates sets of variants against a stemma hypothesis of arbitrary complexity, testing whether any particular set of variants aligns with the stemma hypothesis, and measuring the relative stability of

17

particular readings. This method offers a means of introducing statistical probability by taking all the available evidence of a text into account, thus removing the limitations of using scholarly instinct alone. The Stemmatology software is available for re-use <https://github.com/tla/stemmatology>.

“Contextual factors in literary quality judgments: A quantitative analysis of an online writing community”, Peter Boot, Huygens Institute of Netherlands History, The Netherlands

Online writing communites, communities submitting and discussing their stories, offer an interesting datasets for digital humanities investigation. This paper focusses on , a Dutch-language online writing community unfortunately no longer active. The different types of texts and comments on the texts are at the centre of the investigation. The available data (acquired by a Web crawl) consists of 60,000 texts, 2,435 authors, 350,000 comments, 450,000 responses, and 150,000 ratings. It is not always clear that works weren’t changed, works could have been removed, some communication is missing, and people can always have multiple accounts. Focus of research has been on the comments: number of words, word categories, relations. The average comment length was 44 words, 4,400 comments had more than 300 words, sometimes more than 2000 words. Most people who wrote stories also wrote comments, some people only commented. There are seven major categories in comments: greetings, praise, critical vocabulary, negative comments, site-related vocabulary, names of members, and emoticons. Members of the writing community can then be characterized according to their behaviour with regard to the categories. Analysis of commenter-author pairs is interesting, the relationship depends on the networking activity of the author and their agreement on style and interests. Writing communities have long been ignored in the humanities, audience studies is an interesting field that could be fruitfully explored in the digital humanities, it is currently left to psychologists, sociologists, and marketeers. Research potentials include analysis of impact of reading and reader response, growth in canon, style in relation to appreciation, stylistics (development, imitation), discussing style as a function of text genre, critical influences from published criticism, dynamics of reading and writing groups. Research limitations, despite of plentiful data, include: no controlled experimental data, completeness, data manipulation, ethical and rights issues, challenge of text analysis tools, and data is no substitute for theory. Beyond online writing communities, it would be interesting to investigate online book discussions, and other online communities around other forms of art.

Session 2“A flexible model for the collaborative annotation of digitized literary works”, Cesar Ruiz and Amelia Sanz-Cabrerizo, Universidad Complutense de Madrid, Spain

The presenters introduced @note, a collaborative annotation tool for literary texts. The tool is a collaborative effort between humanists and computer scientists. The context for this project is the university's collaboration with the Google digitization effort in 2000. The resulting texts aren’t useful

18

to researchers as they are, as reliable texts are needed with a full set of bibliographic information, corrected OCR, and access tools that provide easier access to digitized texts. Without these enhancements we risk that Spain's cultural heritage is ignored in the Google universe and by the Google generation. The project adopted a bottom-up approach, focussing on user needs. The basic principle is to use the digitized image as the starting point and to make it annotable. The project ran as part of the Google digital humanities start-up grant programme. @note promotes the collaborative annotation of text by both teachers and students: a flexible and adaptable model that allows for varying interests and varying levels of expertise. The @note annotation model is user-centric, roles include that of annotator and annotation manager. The @note system has been implemented as HTML5. The administration area has a number of admin functions, the annotation manager can create annotation taxonomies, groups, tags, etc. Books are retrieved intuitively via a search function, annotations are then listed alongside the work with anchors on the image that is annotated on mouseover. There can also be comments on annotations, which will appear indented. Annotations can also be filtered logically according to rule sets that can be defined. The project is still work-in-progress: integration of annotations into e-learning environment, and moving academic annotations into a research support environment are high on a list of desirable additions. Retrieval of texts from other libraries is being investigated. Also, a suggestion system for concepts and tags is in development. Students’ e-writing skills are an important area of development and a project like this can aid in this endeavour.

“Digital editions with eLaborate: from practice to theory”, Karina van Dalen-Oskam, Huygens Institute for the History of the Netherlands – Royal Dutch Academy of Arts and Sciences, The Netherlands

The subtitle of the presentation, “From practice to theory”, is intentional, users influence the definition of the underlying principles of the eLaborate tool. The software has been developed by Huygens ING, a research organization with a strong focus on textual scholarship. The institute offers consultancy and produces scholarly editions. Most texts chosen for the editing platform relate to Huygens ING’s own research topics, text analysis of digital textual studies is high on the research agenda. The aim is to produce a corpus of high-quality electronic texts, the starting point are basic marked-up texts, which can be enhanced at any point, crowd-sourcing is used for transcription projects. Editions are often still produced in Word, the transformation to online editions is lengthy, but the use of a common platform is making this task easier. Features can then be added to the platform as a whole which will filter through to all editions in the same environment. Also, text analysis tools are being developed as part of this process. The system is called eLaborate, a flexible online environment which offers an editorial workspace for editors. Side-by-side display of manuscript image, transcriptions, translation, annotations, and metadata is available. The system has been tested with various groups of users, including volunteers with an academic background and university students in textual scholarship courses. Users can hide annotations by other users or the editor, but the editor cannot, he needs to make a scholarly decision what to do with these

19

annotations, how to categorize and use them for a particular edition. Diplomatic and critical transcriptions are currently produced for each edition, in an ideal case both would be generated from the same source. There is a need for several layers and types of annotation, including personalized, authorized, and editable annotations. The editor’s role changes in this environment,, he becomes a moderator and teacher, and tasks can be divided among the crowd based on expertise. The eLaborate software is available to external projects as long as there is a shared interest in the texts. eLaborate is not (yet) intended to be open-sourced for a variety of technical and institutional reasons.

“Wiki Technologies for Semantic Publication of Old Russian Charters”, Aleksey Varfolomeyev, Petrozavodsk State University, Russian Federation, and Aleksandrs Ivanovs, Daugavpils University, Latvia

This project is investigating the application of semantic publication principles to historical documents. Semantic description levels include the palaeographic level, the internal structure, and semantically interconnected documentary evidence. Semantic publications which are provided with additional information layers represent knowledge about documents in a formalized way. Based on semantic Web technologies (semantic networks, triples, and ontologies), semantically enhanced historical records can form the basis for a VRE, and can represent both the physical form and authentic texts (diplomatic transcription) of historical records, as well as their translations into modern languages. Semantic MediaWiki has been developed and adapted for this purpose. The Wiki markup language works well with semantic annotation as it allows for the named annotation of textual features. The text corpus is a corpus of Old Russian charters from the late 12th to the early 17th centuries from the Latvian National Archives and the Latvian State Historical Archives. <Histdocs.referata.com> is the home for these historical documents. MediaWiki extensions such as XML2WIKI and WebFonts are helping to overcome some of the issues encountered during the process, particularly working with encoded full-text and Unicode. The pages in the Semantic Media Wiki can also be used to visualize firstly the semantic network of relations between the charters’ texts and secondly links to external historical records and research publications. Within the semantic network different facts about the charters, which are recorded with Semantic MediaWiki tools, can be automatically transformed into RDF triplets. Therefore, Wiki-systems can be used for the production of semantic publications of charters and other written documents.

Session 3“Connecting European Women Writers. The Selma Lagerlöf Archive and Women Writers Database”, Jenny Bergenmar and Leif-Jöran Olsson, University of Gothenburg, Sweden

The Selma Lagerlöf Archive (SLA) is an attempt to make the collected writings of this important Swedish author available, including her literary and epistolary writing. The SLA makes the Lagerlöf

20

collection at the National Library of Sweden available to a wider audience and aims to create a wider cultural and historical context for these documents. Aims include to create a digital scholarly edition of her works, to digitize her collections at the National Library of Sweden as well as contextual materials in other archives, to establish a bibliographical and research database. A first demo version of the archive is currently under development. Linking the SLA to the Women Writers Database <www.databasewomenwriters.nl> is a means of achieving some of these aims, the links between Lagerlöf and the wider context of women writers in the late 19th century is only partially implemented, and such integration of resources into a European Women Writers database could be beneficial in tracing and visualizing these links. To achieve interoperability and data exchange between the archive and the database, a meeting was held to discuss a suitable data model. A minimal set of entities was agreed on and the model now facilitates relations betweens people, works, and people and works, and also provides information on holding institutions. It is now possible to share XML records between the project and the WomenWriters database via a single API. Underlying the system is eXist-db with Apache Tika for binary file format data extraction and indexing. A set of microservices have been created that can query and extract, manipulate and re-use the data in the XML database. There is also the option to publish resources in a machine-processable way. Collation of texts and variants can be recorded relatively easily with tools like Juxta and Collate. Also reception histories and discourses surrounding individual texts would be a fruitful area of enquiry, similarly translations of the works into other languages. For small projects, collaboration with larger projects is extremely helpful as is the use of existing tools and platforms for the standards employed.

“Modeling Gender: The ‘Rise and Rise’ of the Australian Woman Novelist”, Katherine Bode, Australian National University, Australia

This paper uses quantitative and computational methods to trace gender trends in contemporary Australian novels, drawing on the resources of the government-funded AustLit database. AustLit is the most comprehensive online bibliography of Australian literature, it contains details for over 700,000 works and secondary literature. As AustLit is based on standard bibliographical standards, it provides an excellent source for computational analysis. Methodologically related to Moretti's distant reading paradigm, the approach used here avoids the methodological void of Moretti's approach. While AustLit is an impressive resource, it is by no means complete, also updates are intermittent. In addition, genre and national complexities make a precise delineation impossible. There has been plenty of criticism of Moretti’s paradigm, mainly because of the lack of documentation about the definition of basic terms and genre boundaries in the data used for his work. Criticism of literary statistics centres on the imperfections of the representations of the literary fields produced. McCarthy focuses more on practical applications in his book Humanities Computing (2005). The data modelling approach used by the presenter to explain gender trends is in response to Moretti’s assertion that gender trends in literature are complementary and based on different talents in men and women writers. The problem is that the statements are not taking into

21

account the whole picture, particularly non-canonical literature. The presenter demonstrated that by using a combination of book historical and digital humanities methodologies, it was possible to trace the rise of women authors between 1960 and 1980, a trend usually attributed to second-wave feminism which deconstructed the male canon. On closer inspection though, the largest area of growth during this period was in the genre of romance fiction, a form of writing usually seen as inimical to second-wave feminism. Another trend that can be observed is that while women now account more than half of all Australian novels, critical discussion in newspapers and scholarship has declined over the same period. The presenter concluded that digital humanities can offer a valuable contribution to the field of humanities by not only questioning scholarly assumptions, but also by challenging unqueried institutionalized procedures and systems.

Closing plenary“Embracing a Distant View of the Digital Humanities”, Masahiro Shimoda, Department of Indian Philosophy and Buddhist Studies/ Center for Evolving Humanities, Graduate School of Humanities and Sociology, University of Tokyo, Japan

Humanities research, just like digital humanities research, needs to be aware of its roots and the relations between its predecessors and successors, as well as relations between languages and cultures. The relation between cultures in their succession is a key humanistic research interest, and it is equally applicable to the digital humanities. The humanities have two types of diversity, the object of study and the method of research. Diversity in the digital humanities is amplified by the inclusion of technology and the multiplicity of associated computational methodologies. Stability in the digital humanities is one of the major challenges with the technological changes that are happening rapidly. The discovery of the East in the 18th century as a subject for humanistic research has coincided with the evolution of some of he principal and long-standing methodological principles in the humanities generally. Distant perceptions, both spatially but also temporally, have allowed the humanities to establish a view of Eastern and ancient cultures that is detached from personal views and experiences. This is a paradigm that fits well with the digital humanistic enquiry. The digital humanities' technological foundations have an equally distancing effect on the objects of study. The full-text digitization of one of the major Chinese Buddhist corpora, Taisho Shinshu Daizokyo, with well over a hundred million characters in 85 volumes has transformed both access to and our understanding of these texts in their cultural contexts. The digital humanities have a huge opportunity in the digitisation and analysis of these vast resources and revise some of the long-standing traditional humanistic scholarship that has dominated the discourse for the past 100 years. A bird's eye view of the many interconnections between these resources is a fascinating way of approaching the Buddhist textual, cultural and religious heritage. Buddhist scriptures are also challenging some of the traditional perceptions of “written texts”, the scriptures not only usually have several titles, but also multiple authorship, some of which can be purely legendary. We need to go beyond the boundaries of individual texts to corpus analysis to highlight the importance of connections in traditions of writings materialized in individual scriptures. In a broader perspective,

22

the traditional humanities will not be succeeded by the digital humanities, but the field will be enriched and will enrich our understanding of our cultural heritage.

Next year's Digital Humanities conference will be hosted by the University of Nebraska-Lincoln, USA, DH 2014 will be hosted by the University of Lausanne, Switzerland.

03/08/2012 – Alexander Huber, Digital Collections Development, Bodleian Libraries, University of Oxford

23