data science with human in the loop @faculty of science #leiden university
TRANSCRIPT
Cognitive Computing with Human in the Loop
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Lora Aroyo Web & Media Group, VU
IBM Center for Advanced Studies (CAS)
Harnessing User Semantics at Scale
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Who am I …
Vrije Universiteit Amsterdam computer science professor heading web & media group Amsterdam Data Science
IBM Center for Advanced Studies, Amsterdam research associate leading cognitive computing & crowdsourcing team
Columbia University, NY visiting scholar computer science, NLP, Computer Vision Columbia Data Science
Tagasauris Inc, NY
Chief of Science
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
VU Web & Media Group …
Tobias Kuhn
DavideCeolin
VictordeBoer
JanWielemaker10 PhD Students
LoraAroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
VU Web & Media Group …
Tobias Kuhn
DavideCeolin
VictordeBoer
JanWielemaker10 PhD Students
LoraAroyo
Intelligent & Interactive Information Systems enriching metadata & content of digital collections content analysis for entity extraction modeling provenance in digital collections
tracking changes over time augmenting online multimedia
text & video summarization interactive product placement, hotspots
assessing quality of web data bias, controversy, opinions, perspectives uncertainty, ambiguity trust, privacy
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
… but they don’t actually understand people
software systems becoming ever more intelligent
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
not all human knowledge can yet be captured by machines for wide ranges of real-world contexts
Knowledge Representation aims at human knowledge in machine-readable form
all the information machines have is all the information there is
there is always something else …
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
key scientific challenge: capturing human knowledge
at scale and adequate to real-world needs
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Human Computation: how human intelligence at scale can be used to
improve machine-based knowledge
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
understanding human computation: improving how machine-based systems
acquire, capture & harness human knowledge
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
… understanding the data
variety of meanings multitude of perspectives
abundance of sources endless applications
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
… understanding the crowds
volunteers enthusiasts
visitors on-site visitors online paid crowds
in-house experts
understand who are the different crowds what can they do for your collection
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://crowdtruth.org/
framework that facilitates data collection, processing & analytics
of human computation knowledge
“best collective decisions are result of disagreement,
not consensus or compromise” James Surowiecki
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
disagreement = signal
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://crowdtruth.org/
disagreement is signal for the natural ambiguity of language and
diversity & perspectives of human interpretation
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://controcurator.org/
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo �X
Interac(veExplora,on&DiscoveryinContextbuildingautoma(cstorylines(narra(ves)
DIVE+
Aggregatedviewsoverthecollec(oncollec(ngperspec,vesfromcrowds&niches
http://diveproject.beeldengeluid.nl/
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
VOTE for DIVE: https://summit2017.lodlam.net/2017/04/12/dive-explorative-search-for-digital-humanities/
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
VU – IBM CAS Team
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
VictordeBoerLoraAroyo OanaInel
ChielvandenAkkerSusanLegêne
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
CarlosMarAnezOrAz
WernerHelmich
BerberHagedoornSabrinaSauer
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
LilianaMelgar
JohanOomen JaapBlom
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Cognitive Computing with Human in the Loop
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Lora Aroyo Web & Media Group, VU
IBM Center for Advanced Studies (CAS)
Harnessing User Semantics at Scale
https://www.rijksmuseum.nl/en/rijksstudio CrowdsforCo-crea-onData
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
… by user-driven augmentations of exiting online collections
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
NichesourcingwithExperts
http://annotate.accurator.nl
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
niches of people with the right expertise to contribute specific information
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
TrainLayCrowdstobeExperts
training the general crowd to be a niche: game in which players can carry out an expert
annotation tasks with some assistance
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://spotvogel.vroegevogels.vara.nl
Volunteer crowds for continuous gaming
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
PaidCrowdsforVideoAnalysisCrowdTruth.org
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
PaidCrowdsforTextAnalysisCrowdTruth.org
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
PaidCrowdsforImageAnalysis
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
CrowdTruth.org
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Challenge 1: Typically undertaken in isolation
Challenge 2: Difficult to estimate & control the time to complete
Challenge 3: Difficult to assess & compare quality
Challenge 4: Demands continuous promotional effort
Challenge 5: Active learning (human-in-the-loop) needs different expertise
Challenge 6: Challenging for institutions to incorporate crowdsourcing results into their existing content infrastructure
Crowdsourcing Challenges
measure & assess ensure impact
• be aware of the channel, e.g. Wikipedia, Wikimedia, Facebook
Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo (2011). On the role of user-generated metadata in audio visual collections. International conference on Knowledge capture K-CAP '11, Pages 145-152
measure & assess monitor progress
6 months 2 years 340,551 tags 36,981 tags 137.421 matches 602 items 1.782 items 555 registered players 2,017 users (taggers) thousands of anonymous players 12,279 visits (3+ min online) 44,362 pageviews
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google
locations (7%)
engeland
persons (31%) objects (57%)
measure & assess evaluate content, compare crowds
88% of the tags useful for specific genres
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://crowdtruth.org/
disagreement signals ambiguity if people disagree then it will be more difficult for a
machine to classify that example
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://mediasuite.clariah.nl/
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
1998
from DVDs to data science
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
1998 2006
1 million dollar prize for best algorithm
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Netflix switches to streaming
2007 1998 2006
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Team BellKor wins Netflix Prize
2007 1998 2006 2009
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
Team BellKor wins Netflix Prize
2007 1998 2006 2009
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
From Jeopardy to real-world problems
2011 2017
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
data is at the centre of every process
http://lora-aroyo.org u http://slideshare.net/laroyo u @laroyo
data is essential to evolve with users