a boring lecture...infrastructure: humanities sustainable easy accessible tools data user friendly...

Post on 26-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A boring lectureAV-Infrastructure

Infrastructure for AV-data

It’s there, but will it be used?

Arjan van Hessen

Research in Men-Machine-Interaction and disclosure of spoken documents with Language and Speech Technology

Self service via the telephone, disclosure of spoken documents with

Language and Speech Technology

The establishing of a sustainable infrastructure for tools and data for the humanities (and SS)

What is an infrastructure?

• Infrastructure is a basic physical and organizational structure needed for the operation of a (scientific) organisation.

• It can be generally defined as the set of interconnected structural elements that provide framework supporting an entire structure of development.

• It is an important term for judging scientific organisation’s development.

• (Wikipedia)

Infrastructure: Astronomy

INFRASTRUCTURE FOR THE HUMANITIES

Infrastructure: Humanitiessustainable

easy accessible

tools

data

user friendly

uploading your own data

Support & expertise

VRE

max. interoperability

standards

HumanitiesCulture

Multi-lingualityInterpretation

Noisy dataDifferent countries ->

Different legal systemsIPR

Privacy

INFRASTRUCTURE FOR AV

AV-material

More and more AV becomes available2013: 72 hour/min on YouTube

• Digitization and Preservation of existing material• Easy and cheap to produce• Easy and cheap to store (YouTube, Cloud)• More professional productions• Extreme need for self-expression?• It’s there because it can

There is no data like more data

What do we do with all this dataDo we need to setup criteria for selection?Do we need a special infrastructure for AV?

1990’s 2020’s

Video

Text

Exa

Peta

Tera

Giga

Data

Vo

lum

e

2000’s 2010’s

Structured data

Audio

Image

Med

High

Low

Co

mp

uta

tio

nal

Ne

eds

Sop

his

tica

tio

n o

f A

nal

ysis

Exp

ress

iven

ess

Digital Marketing

12% of video views

Wide Area Imagery

100’s TB per day72 video hrs/minute

News / Media

Source:

IBM Market Insights

based on composite

sources / GTO 2013

Safety / Security

Healthcare

Food

1B camera phones

1B medical images/yr

10s millions cameras

Social Media / Video

Used by 1/3 of enterprises

... Tsunami of data is coming towards us....

Focused on Text(2009 – 2015)

Audiovisual, Text, Structured Data(2015 – 2021)

Text to Audiovisual

Linguists Media Studies Social-Economic History

Infrastructure for AV-material

• There are AV-infrastructures but

– They are often primitive

– Not interoperabel

• But will it be used?

PROBLEMS SEEN IN THE LAST YEARS

Doing an assessment to set a baseline (2010)

Telling the story on conferences and meetings

ACCESSIBILITY

How to find the right material?

• AV-archives are not connected

• Finding needs metadata witch is lacking

• Not all archives want their (meta)data to be open for everyone

• Large part of the metadata may be restricted and/or not accessible

• Searching capability is limited (no semantic search or ontologies)

Problems with access

• (Federated) login is not always the case

• Source data is mostly not accessible

– Restriction

– Technological problem (no streaming)

linking a person's electronic identity and attributes, stored across multiple distinct identity management systems,Ideally resulting in a onetime login

Not enough context

• The metadata is often created with the purpose of the original recordings in mind.As a result much necessary context information is missed

• Data without the necessary context for my research(are they family, why was it recorded, where was the material used before)

• It is difficult (not technical but organisational) to add new metadata to existing collections

Not digitized

• Not all material is, or will be digitized

• Especially in big archives there is too much material and digitizing and digital storage is to costly

Virtual Research Environment

• Scholars need an environment for their work

• If possible it will be in the web with access from everywhere, a login, a storage, and an option of sharing.

ELABORATION

Traditional: make your own data

• There is still a tradition to make your own data

• In order to interchange the metadata standards are necessary

• CMDI (or any other official MD standard) is often to complicated to use without additional training -> Spreadsheet Science.

Opinion storing

• Hard metadata <-> Soft metadata

• Soft MD

– Opinions (more than one)

– Need to be added in a sustainable way

MENTALITY

Sharing

• People do not want to share THEIR data.

• This is a bigger problem wit AV-material than with text-data?

• Not used to look for existing data (attitude)

• Much more privacy issues than in text-only

SOLUTIONS

Go out and:

• Ask

• Demonstrate

• Organize workshops

• Make demo’s and prototypes

User involvement (technicians)

User involvement (scholars)

Oral History

Art HistoryNarrative psychology

Facial expression

Psychotrauma

Medical antropology

How-To material

• Make manuals and screen casts (how-to)

Co-develop

• Organize workshops where scholars and technicians co-develop.

Trigger scholars to participate

Dissemination

• We created 9 videos to show other scholars the wonderful things a good infrastructure can do

Technical

• Technology must WORK (and not most of the times)

• Technology must be understandable as much as possible

Standards

• CMDI, TEI

• Distributed metadata, so different files of metadata can be used.

Fragment Level

IPR & Privacy

CollectionLevel

FileLevel

CONCLUSION

AV-infrastructure

• The do exist but….

• We need to invest into a dialogue between ICT & Humanities: what do you need?

• We need to invest in the use of standards

QUESTIONS?

top related