semantic digital humanities workshop 2015 @oxford
TRANSCRIPT
Open, Connected & Smart Heritage: Towards New Cultural Commons
Lora Aroyo
Semantic Digital Humanities 2015, Oxford
massive amount of digital content to explore …
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
but at some point it all looks the same …
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
audiences feel disconnected & lost …
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
SMART
We need more of this
Johan Oomen, Lora Aroyo (2011). Crowdsourcing in the Cultural Heritage Domain: OpportuniCes and Challenges, hDp://www.iisi.de/fileadmin/IISI/upload/2011/p138_oomen.pdf
CONNECTED OPEN
Smart new technologies for indexing, retrieval & linking link to the workflows of creative industries distribution over various devices & platforms
Connected
Open to users
between collections to distributed content
“For content to be truly accessible, it needs to be where the users are, embedded in their daily networked lives.” (Wabel, 2009)
to stimulate collaboration & creativity
“Enabling anything like seamless access to the cultural record will require developing tools to navigate among vast catalogs of born-‐digital and digiCzed materials […] The return on this investment will be a humaniCes and social science cyberinfrastructure that will allow new quesCons to be asked, new paDerns and relaCons to be discerned, and deep structures in language, society, and culture to be exposed and explored.”
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
… Digital Humanities researchers
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
… they often don’t find what they were searching for
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
“an event is the exemplificaLon of a property by a substance at a given Lme” Jaegwon Kim, 1966 “events are changes that physical objects undergo” Lawrence Lombard, 1981
“events are properLes of spaLotemporal regions”, David Lewis, 1986
L. Aroyo, C. Welty: Harnessing Disagreement in Crowdsourcing Events. DeRIVE 2011 @ISWC2011.
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
typically collections are described by experts ...
“A planned public or social get together or occasion.”
“an event is an incident that's very important or monumental”
“An event is something occurring at a specific time and/or date to celebrate or recognize a particular occurrence.”
“a location where something like a function is held. you could tell if something is an event if there people gathering for a purpose.”
“Event can refer to many things such as: An observable occurrence, phenomenon or an extraordinary occurrence.”
but the crowd talks about things in a different way ...
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
… and they all search & browse with some implicit relevance in mind
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
we need …. support of mulLple perspecLves
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
How to bridge the GAP between Expert & Crowd SemanDcs?
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
a novel approach to gather diversity of perspecDves & opinions from the crowd, expand expert vocabularies with these and gather new type of gold standard for machines
http://lora-aroyo.org http://slideshare.net/laroyo @laroyo
L. Aroyo, C. Welty: Crowd Truth: Harnessing disagreement in crowdsourcing a relaLon extracLon gold standard. ACM WebSci 2013.
CrowdTruth.org
Peter Singer
we have …. altruism-‐driven crowds
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
Peter Singer
we have …. altruism-‐driven crowds
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
Q: Why did you tag?
0% 20% 40% 60% 80% 100%
don't remember
to connect with others
so that I could find works again later
other (please specify)
to learn about art
to improve search for other users
for fun
to help museums document art work
Public
MMA
diversity of opinion Independent decentralized aggregated
James Surowiecki
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
“the wise crowd”
3 of our Crowdsourcing Use Cases
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
http://www.prestoprime.org/
Use Case 1: Crowdsourcing Video Tags @Sound and Vision
@waisda hNp://waisda.nl
Two Pilots
Results of First Pilot
– The first 6 months: • 44.362 pageviews • 12.279 visits (3+ min online) • 555 registered players (thousands anonymous players!)
– 340.551 tags added to 602 items – 137.421 matches
Results of First Pilot
11 PartcipaLng Museums 1,782 Works of Art in the Research 36,981 Tags collected 2,017 Users who tagged
First two years (2006-‐2008)
Q: Why did you tag?
0% 20% 40% 60% 80% 100%
don't remember
to connect with others
so that I could find works again later
other (please specify)
to learn about art
to improve search for other users
for fun
to help museums document art work
Public
MMA
Tags by Documentalists • Tags describe mainly short segments • Tags are oeen not very specific • Tags not describe programmes as a whole • User tags were useful & specific -‐-‐> domain dependent
user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google
locations (7%)
engeland
persons (31%) objects (57%)
On the Role of User-‐Generated Metadata in A/V CollecCons Riste Gligorov et al. KCAP Int. Conference on Knowledge Capture 2011
Crowd vs. Professionals
Waisda?: Tags vs. Rest System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138 NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276
All tags better than consensus only • Improvement of 53% • Consensus tags have
• higher precision: 0.59 vs. 0.49 • but lower recall: 0.28 vs. 0.42
Waisda?: Tags vs. Rest System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138 NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276
All tags better than rest • Individually
• beat NCRV tags by 69% • beat captions by 39%
Waisda?: Tags vs. Rest System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138 NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276
All tags better than rest • Individually
• beat NCRV tags by 69% • beat captions by 39%
• Combined • Improvement of 5%
Waisda?: Tags vs. Rest System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138 NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276
All data performs best • largely due to contribution of user tags – 33%
Waisda?: Tags vs. Rest System MAP
All user tags 0.219
Consensus user tags only 0.143
NCRV tags 0.138 NCRV catalog 0.077
Captions 0.157
Captions + User tags 0.247
Captions + NCRV catalog 0.183
Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276
All tags better than consensus only • Improvement of 53% • Consensus tags have
• higher precision: 0.59 vs. 0.49 • but lower recall: 0.28 vs. 0.42
All tags better than rest • Individually
• beat NCRV tags by 69% • beat captions by 39%
All data performs best • largely due to contribution of user tags – 33%
• Combined • Improvement of 5%
Current Pilot
Accurator ask the right crowd, enrich your collection
hNp://annotate.accurator.nl
Use Case 2: Crowdsourcing & Nichesourcing
@Rijksmuseum
Rijksmuseum Amsterdam collection over 1 million artworks
only a small fraction of about 8000 items are currently on display
… online collection grows 125.000 artworks already available
another 40.000 are added every year
expertise of museum professionals is in describing & annotating collection with art-historical information, e.g. when they were
created, by whom, etc.
detailed information about depicted objects, e.g. which species the animal or plant belongs to,
is in most cases not available
annotated only with “bird with blue head near branch with red leaf”
species of the bird and the plant are missing
by involving people from outside the museum in annotation process, we support
museum professionals in their annotation task
use crowdsourcing to get more annotations use nichesourcing, i.e. niches of people with the right expertise, to add more specific information
use sources like Twitter to find experts or groups of experts on certain areas, e.g. bird
lovers, ornithologists or people who enjoy bird-watching in their spare time
platform where users enter tags: (1) structured vocabulary terms or (2) free text
hNp://annotate.accurator.nl
for tasks that are too difficult: game in which players can carry out an expert
annotation task with some assistance
to evaluate the correctness of annotations: reviewed & rated by other experts
BIRDWATCHING RIJKSMUSEUMSunday October 4, 10.00 am - 14.00 pmCuypers Library Rijksmuseum
On World Animal Day, the Rijksmuseum will host a birdwatching day in collaboration with Naturalis Biodiversity Center, Wikimedia Netherlands and the COMMIT/ SEALINCMedia project.
We are looking for bird watchers to join an expedi-tion through the digital collections and help the museums identify bird species in works of art.
dive.beeldengeluid.nl
In Digital HermeneuDcs
Use Case 3: Event-‐centric ExploraLon
Sound & Vision and Royal Library
dive.beeldengeluid.nl
3rd Price at the SemanLc Web Challenge 2014
OPENIMAGES.EU • 3000 videos • NL InsLtute for Sound & Vision • mostly news broadcasts
DELPHER.NL • 1.5 Million Scans of • Radio bulleLns • (hand annotated) • 1937 – 1984
Simple Event Model (SEM) OpenAnnotaDon (OA) & SKOS
DIVE:MEDIA OBJECT
SEM:EVENT
SEM:PLACE
SEM:TIME
SEM:ACTOR
SKOS:CONCEPT
OA:ANNOTATION
• LINKS TO EUROPEANA (MULTILINGUAL) • LINKS TO DBPEDIA
Digital Submarine UI
Infinity of ExploraDon
Events Linking Objects
Crowd Bringing the Human PerspecDves
Linked (Open) Data
EnDty & Event ExtraDon with CrowdTruth.org
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND CONCEPTS TO KEYFRAMES
Erp, M. van; Oomen, J.; Segers, R.; Akker, C. van de; Aroyo, L.; Jacobs, G.; Legêne, S; Meij, L. van der;O ssenbruggen, J.R. van; Schreiber, G. AutomaLc Heritage Metadata Enrichment with Historic Events Museums and the Web 2011 hlp://www.museumsandtheweb.com/mw2011/papers/automaLc_heritage_metadata_enrichment_with_hi
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
engaging users
through event
narratives
DIVE implements Digital Hermeneu3cs • a theory of interpretaCon of informaCon • bringing people and technology together to explore:
– how to model and represent informaLon – how to provide engaging interacLon – how to support interpretaLon
“Digital HermeneuCcs: Agora and the online understanding of cultural heritage” In proceedings of Web Science
Conference, (ACM: New York, 2011)
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
Chiel van den Akker, Marieke van Erp, Lora Aroyo, Ardjan van Nuland, Lourens van der Meij, Susan Lêgene, and Guus Schreiber (2013). EvaluaDng Cultural Heritage Access on the Web: From InformaDon Delivery to InterpretaDon Support
(WebSci’13)
Informa3on: Museums & Archives as Inventories of the World
André Malraux, The Imaginary Museum of World Sculpture, 1953
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
Interpreta3on: Museums & Archives as a Place to Engage with the World
Acknowledgements
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
PrestoPrime Team: Lora Aroyo, Riste Gligorov, Lole Belice Baltussen, Maarten Brinkerink, Johan Oomen, Jacco van Ossenbruggen, Michiel HIldebrand
http://prestoprime.eu
SealincMedia Team: Alessandro Bozzon, Geert-‐Jan Houben, Lora Aroyo, Lizzy Jongma, Guus Schreiber, Chris Dijkshoorn, Jasper Oosterman, Jacco van Ossenbruggen, Archana Nolamkandath,
Myriam Traub
http://sealinc.ops.few.vu.nl/invenit/
DIVE Team: Victor de Boer, Oana Inel, Lora Aroyo, Johan Oomen, Elco Van Staveren, Werner Helmich & Dennis De Beurs
dive.beeldengeluid.nl
Agora Team: Lora Aroyo, Guus Schreiber, Lourens van der Meij, Marieke van Erp, Chiel van den Akker, Susan Legêne, Geertje Jacobs,
Johan Oomen
agora.cs.vu.nl
CrowdTruth Team: Lora Aroyo, Chris Welty Robert-‐Jan Sips, Carlos MarDnez OrDz, Anca Dumitrache, Oana Inel, Benjamin Timmermans, Susanna van de Ven, Merel van Empel, Jelle v.d. Ploeg, TaLana Cristea, Khalid Khamkham, Harriële Smook, Rens van Honschooten, Arne Rutjes
CrowdTruth.org github.com/CrowdTruth
Links
On the Web • http://waida.nl • http://prestoprime.org • http://agora.cs.vu.nl • http://sealincmedia.wordpress.com • http://dive.beeldengeluid.nl • http://crowdtruth.org • http://game.crowdtruth.org • http://wm.cs.vu.nl
On TwiNer @waisda @agora-‐project @sealincmedia @prestocenter @vistatv #CrowdTruth
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo
THANK YOU!
http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo