putting historical data in context: how to use dspace-glam

Post on 21-Jan-2018

333 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PUTTING HISTORICAL DATA IN CONTEXT: HOW TO USE DSPACE-GLAM

We will talk about…

1. Theoretical and methodological foundations of the DSpace-GLAM project2. Managing digital objects with DSpace3. Exentending the DSpace data model with DSpace-GLAM4. Integrating DSpace and DSpace-GLAM entities5. Digital cultural resources fruition and sharing with add-ons6. Dataset analysis with CKAN7. Conclusions

The BIG DATA age

• Since several years the term "Big Data" has been bursting into the world of Information Technology,

• Promising potential related to a new generation of technologies and architectures able to extract value from the enormous amount of data which is continuously produced in the most different fields

In the science domain "Big Data" are seen as an opportunity even bigger

The "data deluge" will make obsolete some of the fundamental concepts on which the scientific method has been based so far

A new scientific paradigm ?

No more theories?

No more hypothesis?

No more models?

Numbers speak for themselves?

A new scientific paradigm ?

Certainly new opportunities…

Source:http://bouache.com/blog/big-data/

• Being able to manipulate and analyze massive amounts of data represents an important progress for science

• It won’t abolish the need to build, refine and verify theories

• It will allow to formulate hypotheses and test them infinitely more rapidly and on an infinitely larger sample than in the past

…also for humanities

No data deluge, but…growingamount of data

• Databases• Electronic journals• Digitization• Tools for data extaction• …

A variety of multidisciplinarydata are related to Cultural Heritage and History

Different in:TypologyFormatStructureScale

More and more complexity

In the humanities most of the dataare created or collected by people(not measured by instruments)

They are affected by individuals, place, time

The are fragmentary, partial, biased

Source: http://www.asianscientist.com/2016/07/print/body-as-a-source-of-big-data/

Putting data in context

Digital Cultural Data have to be analyzed togetherwith all contextual information, digital and notdigital, needed to answer research questions, suchas:

• (cultural, social, economic, technological…) production context of a document/monument

• formation processes of an archaeologicalrecord

• contextual associations at different levels and scales (according to the different dimensions of variations)

Source: https://ddd.uab.cat/pub/expbib/2006/terradefoc/10.pdf

A Digital Humanities approach is fundamental…

Such an approach, with its focus on relationships, can help in identifying the important dimensionsof variation (the CONTEXT)

It can help in analyzing primary sources asevidences of a network of heterogeneous systems which can be studied by means of them through a global (holistic) and multidimensional analysis

Technological Environmental Social

Cultural Economic

Source: Hodder I. 2016, Studies in Human-Thing Entanglement, p. 28

…within a Digital Library Management System

To move such an approach from theory to practice we need infrastructures and tools for integration, analysis and storage of digital data and resources.

Today most of the cultural digital resources and data are in the Digital Libraries or Repositories

Are Digital Libraries and Repositories that must provide tools for:

• modeling, visualising and analysing information, both in a qualitative and quantitative way, as well as collaboratively working on it

• highlighting the relationships between data at different scales• explaining interpretations about the important dimensions of variation and about the network

of contextual relations in which historical sources are involved

To enter the daily workflow of historians, archaeologists and humanities scholars.

,

Why DSpace?

To achieve the outlined goals and build a state-of-art Digital Library Management System, open source software is preferable.

Development of open source software gives effective way to create Digital Library Management Systems with a small financial investment.

Looking exactly at sustainability, among the most used open source Digital Library Management Systems, we chose DSpace.

,

Why DSpace?

DSpace out-of-the-box allows to:

• capture and describe digital material using a submission workflow module, or a variety of batch ingest options

• distribute digital assets over the web through a search and retrieval system

• preserve digital assets over the long term

,

Why DSpace?

The system is based on the specifications of the OAIS (Open Archival Information System) for Long Term Preservation and is able to manage the whole "life-cycle" of a digital object in terms of "Digital Curation", by means of:

• metadata creation according to different standards • SIP (Submission Information Package) import and validation• AIP (Archival Information Package) creation• AIP export• storage management• digital resources dissemination (also by means of the OAI-PMH)• digital object history management and integrity check

,

Why DSpace?

,

There are over 2200 digital repositories and libraries worldwide using the DSpace application for a variety of digital archiving and dissemination needs.

DSpace is often used as an institutional repository to provide access to research outputs, scholarly publications, library collections, educational material and more.

It is also used as a digital library to store, preserve and disseminate digital cultural heritage.

A fairly large part of the world cultural and scientific heritage is already managed, accessed and preserved using DSpace

It makes sense to enhance a system already widely used rather than propose to migrate data to new platforms

DSpace Data Model

,

Communities & Collections

,

• Communities and collections are entities useful to aggregate DSpaceitems by:• Provenance and responsibility >>> Communities• Metadata, workflow, curation >>> Collections

• They both aggregate the items but they are conceptually different things!

Communities

,

Create your Community

Collections

,

Create your Collection

Collections

,

Collections

,

Collections

,

Workflow

,

Curating items

,

User management

,

E-People andGroups are the way DSpace identifies application users for the purpose of granting privileges

DSpace metadata

, Out-of-the-box DSpace can support multiple flat metadata schemas

You can configure multiple schemas by means of the “Metadata Schema Registry” and select metadata fields from a mix of configured schemas to describe your items

Communities and collections have some simple descriptive metadata (a name, and some descriptive prose)

The submission process

,

Defining the submission form

,

Configure the submission form by meansof input-form.xml file

You can configure different forms for different collections

You can create internal vocabularies for the fields

input-form.xml

,

input-form.xml

,

dc-schema (Required) : Name of metadata schema employed, e.g. dc for Dublin Core. This value mustmatch the value of the schema element defined in the Metadata Schema Registry

dc-element (Required) : Name of the element

dc-qualifier: Qualifier of the element entered, e.g. when the field iscontributor.advisor the value of this element would be advisor. Leaving this out means the input is for anunqualified element.

repeatable: Value is true when multiple values of this field are allowed, false otherwise. When you marka field repeatable, the UI servlet will add a control to let the user ask for more fields to enter additionalvalues.

label (Required): Text to display as the label of this field, describing what to enter, e.g. "Your Advisor'sName".

input-type(Required): Defines the kind of interactive widget to put in the form to collect the Dublin Corevalue.

input-form.xml

,

hint (Required): Content is the text that will appear as a "hint", or instructions, next to the input fields. Canbe left empty, but it must be present.

required: When this element is included with any content, it marks the field as a required input. If theuser tries to leave the page without entering a value for this field, that text is displayed as a warningmessage. For example, <required>You must enter a title.</required> Note that leaving the requiredelement empty will not mark a field as required, e.g.:<required></required>

input-form.xml – dropdown menus

,

To create an internal flat vocabulary youhave to:

• use the «dropdown», «qualdrop» or «list» value within the <input-type> element

• populate the <value-pairs> element

Hierarchical Taxonomies and Controlled Vocabularies

,Dspace offers also a way for structuring and managing more complex, hierarchical controlled vocabularies

Managed in a separate file

Taxonomies are described in XML

Vocabularies are invoked from the input-form.xml, using the <vocabulary> tag within the related <field>

Batch submission process

,Requires the creation of a DSpace Simple Archive:

• A directory for each item to import, containing:

• the files that make up the item.

• An xml file where each metadata element has it's own entry within a <dcvalue> tagset. There are currently three tag attributes available in the <dcvalue> tagset:• <element> - the Dublin Core element• <qualifier> - the element's qualifier• <language>- (optional)ISO language code for

element

• A “contents” file, with the files enumeration

• An (optional) collection file with the information about the collection(s) the item belongs to

<dublin_core>

<dcvalue element="title" qualifier="none">A

Tale of Two Cities</dcvalue>

<dcvalue element="date"

qualifier="issued">1990</dcvalue>

<dcvalue element="title"

qualifier="alternative"

language="fr">J'aime les Printemps</

dcvalue>

</dublin_core>

UI Batch Import

,

You have to:

• Compress the item directories into a zip files.

• Place the zip file in a public domain URL, like Dropbox or Google Drive or wherever you have access to do so

• Then log-in as Administrator and fill the form

UI Batch Import

,

Batch metadata editing

,

DSpace provides a batch metadata editing tool.

The batch editing tool facilitates the user to perform the following:

• Batch editing of metadata by means of a comma delimited file in CSV format

• Batch additions of metadata (e.g. add an abstract to a set of items)

• Batch find and replace of metadata values (e.g. correct misspelled surname across several records)

• Mass move items between collections

• Mass deletion, withdrawal, or re-instatement of items

• Enable the batch addition of new items (without bitstreams) via a CSV file

• Re-order the values in a list (e.g. authors)

Batch metadata editing

,

Extending Dspace

Cultural Institutions in the «Big Data Age» ask for:

• Complex and multidimensional metadata structures• Complex data models• Relationships management between different entities• Tools for digital data and resources visualization, analysis and

interpretation

Why not use an “extended” version of DSpace to meet these relevant needs?

DSpace-GLAM(Galleries, Libraries, Archives, Museums)

Built by 4Science on top of DSpace and to meet the needs of Cultural Heritage institutions

Flexible and extensible data model inherited from DSpace-CRIS (our RIMS)to manage relevant metadata standardsand specific conceptual models

With dedicated add-ons for digital objectscuration, fruition and sharing

Also an add-on for datasets visualizationand analysis

DSpace-GLAM(Galleries, Libraries, Archives, Museums)

DSpace-GLAM is free, open source, compliant with open standards

Add-ons are mainly distributed following a new business model (crowdsourcing)

Provides institutions witha sustainable and effective tool to manage and analyze Cultural Heritage Information

Weakness of DSpace metadata management

• Flat metadata model• Weak support for technical and structural metadata• All information are stored as string at the database level with minimal

support (and validation) for data entry in the UI

• DSpace-GLAM improves the metadata at the item level providing:-Additional input types for data entry (number, year and regex validation)-Partial support for nested metadata-Support for technical and structural metadata

DSpace-GLAM: interoperability

,• Connect to VIAF records and Getty Vocabularies for precise identification

of persons, artists and places

• It has been reported to work nicely with «plain» DSpace, with the authority implementation. Plan to include it out-of-box in DSpace 7

Extending the DSpace Data Model

DSpace-GLAM can manage all the entities important to contextualize digital cultural heritage:

• Persons• Families• Fonds• Events• Places• Concepts• …………..

Entities can be created to integrate different metadata standards and conceptual models

Extending the Data Model

• Persons

• Projects

• Organizations

are pre-defined entities inherited from DSpace-CRIS

… but you are not required to use (all of) them.

you can define additional entities

you can define your own relationships between entities, including the ones that you have defined

Pre-defined entities

Defining other entities

Entities components

Tabs

Box

Fields

Entities components: tabs

Entities components: boxes

Entities components: fields

• Each DSpace-GLAM entity instance has a status flag• Public: the details page is visible to anyone and it will be linked where

appropriate. The record is included in public search results• Private: only administrators can access the details page. The entity is indexed

only for use as authority entry

• Each property/attribute value has an edit mode:• Editable• Visibility flag only• Only Administrators• Read only

• A field becomes visible when included in a public visible tab/box

Data model configurationDSpace-GLAM visibility and security

• Visibility of a tab or box can be restricted to System administrators

Only RP owner

Admins and RP Owner

specific users and groups related to the entity instance

• To restrict the visibility of a box or tab to specific groups or users one or more properties must be indicated containing the users and/or groups that have access to the protected box / tab

Data model configurationDSpace-GLAM visibility and security

• It can be performed via UI and exported to xls

• It can be imported from XLS files

Data model configurationData model configuration

Data model configurationCreating entities relationships

Data model configurationCreating entities relationships

Data model configurationCreating inverse relationships between entities

DSpace-CRIS can use the SOLR indexes to reverse a relation• Documents are linked to the person • But you can also list the documents under a specific person

Relations are defined in the configuration spring file cris-relationpreference.xml and characterized by

A nameThe target entity (a CRIS Entity or a DSpace Item)The SOLR query with {0}, {1} placeholders to be replaced with the CRIS-ID or the uuid of the source CRIS instance

Data model configurationCreating inverse relationships between entities

(cris-relationpreference.xml)

<bean id="relationINTERPRETATIONVSEVENTSConfiguration" class="org.dspace.app.cris.configuration.RelationConfiguration">

<property name="relationName" value="crisinterpretation.events" />

<property name="relationClass" value="org.dspace.app.cris.model.ResearchObject" />

<property name="type" value="crisevents" /><property name="query">

<value>crisevents.eventsrelatedinterpretation_authority:{0}</value></property>

</bean>

Name

Target entity

Solr query

Data model configurationCreating inverse relationships between entities

• Inverse relations can be• Visualized • Used to show aggregated statistics

• To be visualized, relations are embedded in components (see cris-components.xml)

Data model configurationCreating inverse relationships between entities

(cris-components.xml)

<!-- Dynamic object component --><bean id="doComponentsService" class="org.dspace.app.cris.integration.CrisComponentsService">

<property name="components"><map>

<entry key="journalspublications" value-ref="publicationlistforjournals" /><entry key="eventsdocuments" value-ref="publicationlistforevents" /><entry key="placesevents" value-ref="eventlistforplaces" /><entry key="eventsperson" value-ref="personlistforevents" /><entry key="fondschild" value-ref="fondschildforfonds" /><entry key="fondspublications" value-ref="publicationlistforfonds" /><entry key="conceptdocuments" value-ref="publicationlistforconcept"/><entry key="conceptperson" value-ref="personlistforconcept"/>

</map></property>

</bean>Name of the related box for visualizing data

Data model configurationCreating inverse relationships between entities

(cris-components.xml)

<!-- Person list for Events dynamic entity --><bean id="personlistforevents"

class="org.dspace.app.webui.cris.components.CRISRPConfigurerComponent"><property name="relationConfiguration" ref="relationEVENTSVSRPConfiguration" /><property name="commonFilter">

<util:constantstatic-field="org.dspace.app.webui.cris.util.RelationPreferenceUtil.HIDDEN_FILTER" />

</property><property name="target" value="org.dspace.app.cris.model.ResearchObject" /><property name="facets" ref="facetsRPforComponentConfiguration" /><property name="types">

<map><entry key="all" value-ref="allObjectsComponent" />

</map></property>

</bean>

Data model configurationCreating inverse relationships

Data model configurationIntegrating DSpace and DSpace-GLAM

(dspace.cfg)

• All the GLAM’s entities can be linked with DSpace Items and used as authorities for item’smetadata

• This can be done adding some code to dspace.cfg file

##### Authority Control Settings #####plugin.named.org.dspace.content.authority.ChoiceAuthority = \org.dspace.app.cris.integration.ORCIDAuthority = RPAuthority,\org.dspace.content.authority.ItemAuthority = PublicationAuthority,\org.dspace.content.authority.ItemAuthority = DataSetAuthority,\org.dspace.app.cris.integration.DOAuthority = EVENTAuthority,\org.dspace.app.cris.integration.DOAuthority = FONDSAuthority,\org.dspace.app.cris.integration.DOAuthority = CONCEPTAuthority,\org.dspace.app.cris.integration.DOAuthority = INTERPRETATIONAuthority,\

Data model configurationIntegrating DSpace and DSpace-GLAM

(dspace.cfg)choices.plugin.dc.relation.conference = EVENTAuthoritychoices.presentation.dc.relation.conference = suggestauthority.controlled.dc.relation.conference = truecris.DOAuthority.dc_relation_conference.filter = resourcetype_authority:eventscris.DOAuthority.dc.relation.conference.new-instances = eventsItemCrisRefDisplayStrategy.publicpath.dc.relation.conference = events

choices.plugin.dc.relation.concept = CONCEPTAuthoritychoices.presentation.dc.relation.concept = suggestauthority.controlled.dc.relation.concept = truecris.DOAuthority.dc.relation_concept.filter = resourcetype_authority:conceptcris.DOAuthority.dc.relation.concept.new-instances = conceptItemCrisRefDisplayStrategy.publicpath.dc.relation.concept = concept

choices.plugin.dc.relation.fond = FONDSAuthoritychoices.presentation.dc.relation.fond = suggestauthority.controlled.dc.relation.fond = truecris.DOAuthority.dc_relation_fond.filter = resourcetype_authority:crisfonds AND crisfonds.fondsleaf:trueItemCrisRefDisplayStrategy.publicpath.dc.relation.fond = fonds

Authority nameDisplay mode For authority values

Originfor authority values

Entity to populatewith new values

Authority has its own ID

Path to use to linkthe entity

Data model configurationIntegrating DSpace and DSpace-GLAM

Data model configurationCreating inverse relationships between items and

entities

Data model configurationCreating inverse relationships between items and

entities

Data model configurationClustering of related objects

Out-of-the-box are available components implementations to allow configurable rendering of inverse relation for each entities (dspace items or dspace-glam entities)It is possible

• to configure which facets show in the component• to apply filters to the relation• It is possible to enable a clustering using custom categories defined

by facet queries

It is aware of the preference expressed for the relationships

Managing hierarchical archival structures

Extending the data model makes the system able to manage the hierarchicalmetadata structure required by archival standards such as ISAD (G) and EAD

DSpace-GLAM can also manage the production and preservation context of the archive required by ISAAR-CPF, EAC-CPF and ISDIAH

Creating and managing Archival Fonds at differentlevels

Relating an Archival Unit (Item) to a Fond

Visualizing hierarchical archival structures

Overview of the DSpace-GLAM data model

Overview of the DSpace-GLAM data model

Pointing out Social Networks

The system is able to draw graphs based on relationships between Persons using data from the different entities and from the DSpace Items

In particular it draws relationships between persons who:

• Co-authored the same items• Partecipated in the same event(s)• Partecipated in event(s) in the same place(s)• Are related to the same concept(s)

Visualizing relationships between historical figures

Network configuration (network.cfg)

Networks are implemented by plugins

You can write your own implementation typically starting from the default ones

You can canfigure the network layout (colors, nodes numbers, levels)

Formalizing and analysing interpretations

Interpretations are logical processes which starts from data and/or assumptionsand through logical reasoning and connecting persons, events, documents, etc., arrive to one or more conclusions

Often, in humanities, such processes are merged and hidden within naturallanguage narratives

To make such processes explicit, we have to scompose them in differentcomponents and in atomic propositions and display such elements

Formalizing and analysing interpretations

Linking interpretations to entities

With DSpace-GLAM you can link an interpretation to the items, the events and the persons, it is related to

Moreover you can link different interpretations to the same entity

Contextualizing historical data

Painting: The flagellation Painter: Piero della Francesca

Event: Council of Ferrara (AD 1438)

Event: Council of Mantua (AD 1459)

Place: Ferrara Concept: RenaissanceConcept: HumanismConcept: Neoplatonism

Person: Emperor John VIII Palaiologos

Place: Mantua

Interpretation: Ronchey

Ready for Linked Open Data

Ready for Linked Open Data

Linking and relating the createdentities with other authorities,the institution is ready to be part of the Linked Data Graph

Now we are working to include also the additional entities intothe DSpace RDF management features

GLAM

Navigation

Global search across the whole Digital Library

Infographics

Global search across the whole Digital Library

Top objectsusing several criteria

Global search across the whole Digital Library

Faceted Search

Facets

Customizable Browse indexes

Customizable Browse indexes

DSpace-GLAM use cases

Cutural Heritage image files (digitalized manuscripts, paintings, monuments, archaeological finds, rare books, etc.) need to be consulted online, discussed and commented / annotated

IIIF protocols and formats allow you to meet these requirements in a standard and understandable (for both humans and machine) way

DSpace-GLAM use cases

High-quality scanned books have images typically over 100MB for each page

The structure of image sequences are complex and relevant (sequences of pages, of the phases of an historical event, of a cycle of frescoes, etc.)

DSpace-GLAM use cases

The same requirements apply to audio and video content

-Streaming

-Internal structure

-Annotation / commenting / transcript

Adopt an open standard: the MPEG-DASH format allows adaptive streaming over simple html client with full support for multiple tracks, ToC, subtitles

4Science IIIF Image Viewer Addon

IIIF Compliant

1. Presentation API

2. Image API

3. Search API

4. Authentication API (soon)

DSpace item with “see online option”

Offering an integrated Universal Viewerplayer

IIIF Image API allows a smooth interaction with the image files

IIIF Presentation API generated on the fly using the metadata of the item and the bitstreams

Bitstreams metadata

Hierarchical ToC

An example from a PDF document offered as a complex package of page-image

Link images with their textual transcription / OCR

Indexing standard format (hOCR) in a webannotationserver to supply IIIF Search API

Side by side – image vs text using an additional OCR panel

An example in Arabic charactershttps://dspace-glam.4science.it/handle/1234/24

IIIF Image Viewer: share and reuse

Share images with other scholars/users without waiving proper attribution, e.g. using the «manifest» JSON file:

https://dspace-glam.4science.it/json/iiif/1234/11/30/manifest

in another IIIF Image Viewer:http://projectmirador.org/demo/

Audio/Video streaming

Full open source stack:1. Transcoding2. Adaptive streaming3. MPEG-DASH standard

Audio/Video streaming

https://dspace-glam.4science.it/explore?bitstream_id=1841&handle=1234/7&provider=video-streaming

Allows the transcode of the audio/video formats in a format and encoding appropriate to the adopted media server (adaptive video streaming)

Using the DASH standard protocol allows sharing video with other scholars/users without waiving proper attribution, e.g. using the «manifest» XML file:

https://dspace-glam.4science.it/av-stream/1841/ch/0/29/94/83/manifest.mpd

in another DASH clienthttp://dashif.org/reference/players/javascript/v2.4.1/samples/dash-if-reference-player/index.html

Visualizing and analysing datasets

4Science has released a free and open source integration with CKAN, the world's leading open-source data management platform

Using an extensible viewer framework you can now offer data discovery, exploration, preview, sampling and visualization from your DSpace repository

CKAN makes open webservices for tabular data available: https://ckan.org/

Visualizing and analysing datasets

We look at Dspace-GLAM not only as a tool for management and preservation, but also for analysis

Our integration with CKAN allows the visualization and analysis of repertoires and inventories by means of grids, graphs or maps

Datasets can also be related to items and otherentities

https://dspace-glam.4science.it/handle/1234/15

Archaeological finds geolocalization

Visualizing and analysing datasets

https://dspace-glam.4science.it/explore?bitstream_id=1971&handle=1234/22&provider=ckan-recline

Pottery distribution

Why do I need DSpace-GLAM?

• DSpace-GLAM is a powerful extension of DSpace created by 4Science to meet

the needs of Galleries, Libraries, Archives and Museums

• to be able to manage, analyze and preserve digital objects

• together with historical, archaeological or other cultural datasets,

• relating them with other entities such as persons, events, places,

concepts, etc.

• to describe the context of cultural objects and data, according to different

granularity levels, and to different interpretations

• using worldwide adopted, cutting-edge, open-source software and open

standards

How I get DSpace-GLAM?

• Every institution, can install Dspace-GLAM or upgrade its DSpace installation to DSpace-GLAM, extending documents management by creating new entities

• Your publications will be safely managed as before, adding the advantage of linking them to relevant information such as authors, datasets, events, concepts, networks and much more

When can I move to DSpace-GLAM?

• Now: every moment is appropriate to enhance your Digital Library, to better support research activities and make your service more relevant

• Upgrading from DSpace to DSpace-GLAM or installing a brand-new “extended” DLMS does not take much extra effort and it is largely rewarded by the extraordinary results that you can get

• As an extra security, (if you already have a Dspace repository) DSpace-GLAM does not alter the structure of the current objects managed by DSpace, so you can go back from DSpace-GLAM to DSpace at any time just dropping (a lot of) extra tables… but we are confident that you will not want to do that

• Our goal is to provide an environment for integrating the traditional hermeneutic and interpretative work of historical sciences with data visualization and analysis

• In this way, we hope, there may be a fundamental change in the way digital cultural heritage is experienced, analyzed and contributed to by the whole scientific community

Data Science in a Digital Humanities Framework

Thanks for your attention

Andrea Bollini

<andrea.bollini@4science.it>

mobile: +39 333 934 1808

skype: a.bollini

orcid: 0000-0002-9029-1854

Claudio Cortese

<claudio.cortese@4science.it>

mobile: +39 333 9340846

skype: claudio.cortese74

orcid: 0000-0003-4572-9711

top related