A boring lectureAV-Infrastructure
Infrastructure for AV-data
It’s there, but will it be used?
Arjan van Hessen
Research in Men-Machine-Interaction and disclosure of spoken documents with Language and Speech Technology
Self service via the telephone, disclosure of spoken documents with
Language and Speech Technology
The establishing of a sustainable infrastructure for tools and data for the humanities (and SS)
What is an infrastructure?
• Infrastructure is a basic physical and organizational structure needed for the operation of a (scientific) organisation.
• It can be generally defined as the set of interconnected structural elements that provide framework supporting an entire structure of development.
• It is an important term for judging scientific organisation’s development.
• (Wikipedia)
Infrastructure: Astronomy
INFRASTRUCTURE FOR THE HUMANITIES
Infrastructure: Humanitiessustainable
easy accessible
tools
data
user friendly
uploading your own data
Support & expertise
VRE
max. interoperability
standards
HumanitiesCulture
Multi-lingualityInterpretation
Noisy dataDifferent countries ->
Different legal systemsIPR
Privacy
INFRASTRUCTURE FOR AV
AV-material
More and more AV becomes available2013: 72 hour/min on YouTube
• Digitization and Preservation of existing material• Easy and cheap to produce• Easy and cheap to store (YouTube, Cloud)• More professional productions• Extreme need for self-expression?• It’s there because it can
There is no data like more data
What do we do with all this dataDo we need to setup criteria for selection?Do we need a special infrastructure for AV?
1990’s 2020’s
Video
Text
Exa
Peta
Tera
Giga
Data
Vo
lum
e
2000’s 2010’s
Structured data
Audio
Image
Med
High
Low
Co
mp
uta
tio
nal
Ne
eds
Sop
his
tica
tio
n o
f A
nal
ysis
Exp
ress
iven
ess
Digital Marketing
12% of video views
Wide Area Imagery
100’s TB per day72 video hrs/minute
News / Media
Source:
IBM Market Insights
based on composite
sources / GTO 2013
Safety / Security
Healthcare
Food
1B camera phones
1B medical images/yr
10s millions cameras
Social Media / Video
Used by 1/3 of enterprises
... Tsunami of data is coming towards us....
Focused on Text(2009 – 2015)
Audiovisual, Text, Structured Data(2015 – 2021)
Text to Audiovisual
Linguists Media Studies Social-Economic History
Infrastructure for AV-material
• There are AV-infrastructures but
– They are often primitive
– Not interoperabel
• But will it be used?
PROBLEMS SEEN IN THE LAST YEARS
Doing an assessment to set a baseline (2010)
Telling the story on conferences and meetings
ACCESSIBILITY
How to find the right material?
• AV-archives are not connected
• Finding needs metadata witch is lacking
• Not all archives want their (meta)data to be open for everyone
• Large part of the metadata may be restricted and/or not accessible
• Searching capability is limited (no semantic search or ontologies)
Problems with access
• (Federated) login is not always the case
• Source data is mostly not accessible
– Restriction
– Technological problem (no streaming)
linking a person's electronic identity and attributes, stored across multiple distinct identity management systems,Ideally resulting in a onetime login
Not enough context
• The metadata is often created with the purpose of the original recordings in mind.As a result much necessary context information is missed
• Data without the necessary context for my research(are they family, why was it recorded, where was the material used before)
• It is difficult (not technical but organisational) to add new metadata to existing collections
Not digitized
• Not all material is, or will be digitized
• Especially in big archives there is too much material and digitizing and digital storage is to costly
Virtual Research Environment
• Scholars need an environment for their work
• If possible it will be in the web with access from everywhere, a login, a storage, and an option of sharing.
ELABORATION
Traditional: make your own data
• There is still a tradition to make your own data
• In order to interchange the metadata standards are necessary
• CMDI (or any other official MD standard) is often to complicated to use without additional training -> Spreadsheet Science.
Opinion storing
• Hard metadata <-> Soft metadata
• Soft MD
– Opinions (more than one)
– Need to be added in a sustainable way
MENTALITY
Sharing
• People do not want to share THEIR data.
• This is a bigger problem wit AV-material than with text-data?
• Not used to look for existing data (attitude)
• Much more privacy issues than in text-only
SOLUTIONS
Go out and:
• Ask
• Demonstrate
• Organize workshops
• Make demo’s and prototypes
User involvement (technicians)
User involvement (scholars)
Oral History
Art HistoryNarrative psychology
Facial expression
Psychotrauma
Medical antropology
How-To material
• Make manuals and screen casts (how-to)
Co-develop
• Organize workshops where scholars and technicians co-develop.
Trigger scholars to participate
Dissemination
• We created 9 videos to show other scholars the wonderful things a good infrastructure can do
Technical
• Technology must WORK (and not most of the times)
• Technology must be understandable as much as possible
Standards
• CMDI, TEI
• Distributed metadata, so different files of metadata can be used.
Fragment Level
IPR & Privacy
CollectionLevel
FileLevel
CONCLUSION
AV-infrastructure
• The do exist but….
• We need to invest into a dialogue between ICT & Humanities: what do you need?
• We need to invest in the use of standards
QUESTIONS?