semantic digital humanities workshop 2015 @oxford

Open, Connected & Smart Heritage: Towards New Cultural Commons

Lora Aroyo

Semantic Digital Humanities 2015, Oxford

massive amount of digital content to explore …

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

but at some point it all looks the same …


audiences feel disconnected & lost …


SMART

We need more of this

Johan Oomen, Lora Aroyo (2011). Crowdsourcing in the Cultural Heritage Domain: OpportuniCes and Challenges, hDp://www.iisi.de/fileadmin/IISI/upload/2011/p138_oomen.pdf

CONNECTED OPEN

Smart new technologies for indexing, retrieval & linking link to the workflows of creative industries distribution over various devices & platforms

Connected

Open to users

between collections to distributed content

“For content to be truly accessible, it needs to be where the users are, embedded in their daily networked lives.” (Wabel, 2009)

to stimulate collaboration & creativity

“Enabling anything like seamless access to the cultural record will require developing tools to navigate among vast catalogs of born-‐digital and digiCzed materials […] The return on this investment will be a humaniCes and social science cyberinfrastructure that will allow new quesCons to be asked, new paDerns and relaCons to be discerned, and deep structures in language, society, and culture to be exposed and explored.”


… Digital Humanities researchers


… they often don’t find what they were searching for


“an event is the exemplificaLon of a property by a substance at a given Lme” Jaegwon Kim, 1966 “events are changes that physical objects undergo” Lawrence Lombard, 1981

“events are properLes of spaLotemporal regions”, David Lewis, 1986

L. Aroyo, C. Welty: Harnessing Disagreement in Crowdsourcing Events. DeRIVE 2011 @ISWC2011.


typically collections are described by experts ...

“A planned public or social get together or occasion.”

“an event is an incident that's very important or monumental”

“An event is something occurring at a specific time and/or date to celebrate or recognize a particular occurrence.”

“a location where something like a function is held. you could tell if something is an event if there people gathering for a purpose.”

“Event can refer to many things such as: An observable occurrence, phenomenon or an extraordinary occurrence.”

but the crowd talks about things in a different way ...


… and they all search & browse with some implicit relevance in mind


we need …. support of mulLple perspecLves


How to bridge the GAP between Expert & Crowd SemanDcs?


a novel approach to gather diversity of perspecDves & opinions from the crowd, expand expert vocabularies with these and gather new type of gold standard for machines

http://lora-aroyo.org http://slideshare.net/laroyo @laroyo

L. Aroyo, C. Welty: Crowd Truth: Harnessing disagreement in crowdsourcing a relaLon extracLon gold standard. ACM WebSci 2013.

CrowdTruth.org

Peter Singer

we have …. altruism-‐driven crowds


Peter Singer

we have …. altruism-‐driven crowds


Q: Why did you tag?

0% 20% 40% 60% 80% 100%

don't remember

to connect with others

so that I could find works again later

other (please specify)

to learn about art

to improve search for other users

for fun

to help museums document art work

Public

MMA

diversity of opinion Independent decentralized aggregated

James Surowiecki


“the wise crowd”

3 of our Crowdsourcing Use Cases


http://www.prestoprime.org/

Use Case 1: Crowdsourcing Video Tags @Sound and Vision

@waisda hNp://waisda.nl

Two Pilots

Results of First Pilot

– The first 6 months: •  44.362 pageviews •  12.279 visits (3+ min online) •  555 registered players (thousands anonymous players!)

– 340.551 tags added to 602 items – 137.421 matches

Results of First Pilot

11 PartcipaLng Museums 1,782 Works of Art in the Research 36,981 Tags collected 2,017 Users who tagged

First two years (2006-‐2008)

Q: Why did you tag?

0% 20% 40% 60% 80% 100%

don't remember

to connect with others

so that I could find works again later

other (please specify)

to learn about art

to improve search for other users

for fun

to help museums document art work

Public

MMA

Tags by Documentalists •  Tags describe mainly short segments •  Tags are oeen not very specific •  Tags not describe programmes as a whole •  User tags were useful & specific -‐-‐> domain dependent

user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google

locations (7%)

engeland

persons (31%) objects (57%)

On the Role of User-‐Generated Metadata in A/V CollecCons Riste Gligorov et al. KCAP Int. Conference on Knowledge Capture 2011

Crowd vs. Professionals

Waisda?: Tags vs. Rest System MAP

All user tags 0.219

Consensus user tags only 0.143

NCRV tags 0.138 NCRV catalog 0.077

Captions 0.157

Captions + User tags 0.247

Captions + NCRV catalog 0.183

Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276

All tags better than consensus only •  Improvement of 53% •  Consensus tags have

•  higher precision: 0.59 vs. 0.49 •  but lower recall: 0.28 vs. 0.42


All user tags 0.219



Captions 0.157




All tags better than rest •  Individually

•  beat NCRV tags by 69% •  beat captions by 39%


All user tags 0.219



Captions 0.157






•  Combined •  Improvement of 5%


All user tags 0.219



Captions 0.157




All data performs best •  largely due to contribution of user tags – 33%


All user tags 0.219



Captions 0.157




All tags better than consensus only •  Improvement of 53% •  Consensus tags have

•  higher precision: 0.59 vs. 0.49 •  but lower recall: 0.28 vs. 0.42



All data performs best •  largely due to contribution of user tags – 33%

•  Combined •  Improvement of 5%

Current Pilot

Accurator ask the right crowd, enrich your collection

hNp://annotate.accurator.nl

Use Case 2: Crowdsourcing & Nichesourcing

@Rijksmuseum

Rijksmuseum Amsterdam collection over 1 million artworks

only a small fraction of about 8000 items are currently on display

… online collection grows 125.000 artworks already available

another 40.000 are added every year

expertise of museum professionals is in describing & annotating collection with art-historical information, e.g. when they were

created, by whom, etc.

detailed information about depicted objects, e.g. which species the animal or plant belongs to,

is in most cases not available

annotated only with “bird with blue head near branch with red leaf”

species of the bird and the plant are missing

by involving people from outside the museum in annotation process, we support

museum professionals in their annotation task

use crowdsourcing to get more annotations use nichesourcing, i.e. niches of people with the right expertise, to add more specific information

use sources like Twitter to find experts or groups of experts on certain areas, e.g. bird

lovers, ornithologists or people who enjoy bird-watching in their spare time

platform where users enter tags: (1) structured vocabulary terms or (2) free text

hNp://annotate.accurator.nl

for tasks that are too difficult: game in which players can carry out an expert

annotation task with some assistance

to evaluate the correctness of annotations: reviewed & rated by other experts

BIRDWATCHING RIJKSMUSEUMSunday October 4, 10.00 am - 14.00 pmCuypers Library Rijksmuseum

On World Animal Day, the Rijksmuseum will host a birdwatching day in collaboration with Naturalis Biodiversity Center, Wikimedia Netherlands and the COMMIT/ SEALINCMedia project.

We are looking for bird watchers to join an expedi-tion through the digital collections and help the museums identify bird species in works of art.

dive.beeldengeluid.nl

In Digital HermeneuDcs

Use Case 3: Event-‐centric ExploraLon

Sound & Vision and Royal Library


3rd Price at the SemanLc Web Challenge 2014

OPENIMAGES.EU •  3000 videos •  NL InsLtute for Sound & Vision •  mostly news broadcasts

DELPHER.NL •  1.5 Million Scans of •  Radio bulleLns •  (hand annotated) •  1937 – 1984

Simple Event Model (SEM) OpenAnnotaDon (OA) & SKOS

DIVE:MEDIA OBJECT

SEM:EVENT

SEM:PLACE

SEM:TIME

SEM:ACTOR

SKOS:CONCEPT

OA:ANNOTATION

•  LINKS TO EUROPEANA (MULTILINGUAL) •  LINKS TO DBPEDIA

Digital Submarine UI

Infinity of ExploraDon

Events Linking Objects

Crowd Bringing the Human PerspecDves

Linked (Open) Data

EnDty & Event ExtraDon with CrowdTruth.org

ENTITY EXTRACTION

EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG

SEGMENTATION & KEYFRAMES

LINKING EVENTS AND CONCEPTS TO KEYFRAMES

Erp, M. van; Oomen, J.; Segers, R.; Akker, C. van de; Aroyo, L.; Jacobs, G.; Legêne, S; Meij, L. van der;O ssenbruggen, J.R. van; Schreiber, G. AutomaLc Heritage Metadata Enrichment with Historic Events Museums and the Web 2011 hlp://www.museumsandtheweb.com/mw2011/papers/automaLc_heritage_metadata_enrichment_with_hi


engaging users

through event

narratives

DIVE implements Digital Hermeneu3cs •  a theory of interpretaCon of informaCon •  bringing people and technology together to explore:

–  how to model and represent informaLon –  how to provide engaging interacLon –  how to support interpretaLon

“Digital HermeneuCcs: Agora and the online understanding of cultural heritage” In proceedings of Web Science

Conference, (ACM: New York, 2011)


Chiel van den Akker, Marieke van Erp, Lora Aroyo, Ardjan van Nuland, Lourens van der Meij, Susan Lêgene, and Guus Schreiber (2013). EvaluaDng Cultural Heritage Access on the Web: From InformaDon Delivery to InterpretaDon Support

(WebSci’13)

Informa3on: Museums & Archives as Inventories of the World

André Malraux, The Imaginary Museum of World Sculpture, 1953


Interpreta3on: Museums & Archives as a Place to Engage with the World

Acknowledgements


PrestoPrime Team: Lora Aroyo, Riste Gligorov, Lole Belice Baltussen, Maarten Brinkerink, Johan Oomen, Jacco van Ossenbruggen, Michiel HIldebrand

http://prestoprime.eu

SealincMedia Team: Alessandro Bozzon, Geert-‐Jan Houben, Lora Aroyo, Lizzy Jongma, Guus Schreiber, Chris Dijkshoorn, Jasper Oosterman, Jacco van Ossenbruggen, Archana Nolamkandath,

Myriam Traub

http://sealinc.ops.few.vu.nl/invenit/

DIVE Team: Victor de Boer, Oana Inel, Lora Aroyo, Johan Oomen, Elco Van Staveren, Werner Helmich & Dennis De Beurs


Agora Team: Lora Aroyo, Guus Schreiber, Lourens van der Meij, Marieke van Erp, Chiel van den Akker, Susan Legêne, Geertje Jacobs,

Johan Oomen

agora.cs.vu.nl

CrowdTruth Team: Lora Aroyo, Chris Welty Robert-‐Jan Sips, Carlos MarDnez OrDz, Anca Dumitrache, Oana Inel, Benjamin Timmermans, Susanna van de Ven, Merel van Empel, Jelle v.d. Ploeg, TaLana Cristea, Khalid Khamkham, Harriële Smook, Rens van Honschooten, Arne Rutjes

CrowdTruth.org github.com/CrowdTruth

Links

On the Web •  http://waida.nl •  http://prestoprime.org •  http://agora.cs.vu.nl •  http://sealincmedia.wordpress.com •  http://dive.beeldengeluid.nl •  http://crowdtruth.org •  http://game.crowdtruth.org •  http://wm.cs.vu.nl

On TwiNer @waisda @agora-‐project @sealincmedia @prestocenter @vistatv #CrowdTruth


THANK YOU!


semantic digital humanities workshop 2015 @oxford

Technology