fiat 20080921 results pisa

20
medialab PISA PISA – Proof Proof of Concept of Concept Production, Indexing and Search of Audiovisual Material Production, Indexing and Search of Audiovisual Material

Upload: vrt-medialab

Post on 05-Dec-2014

1.879 views

Category:

Technology


5 download

DESCRIPTION

In the research project PISA we have investigated how powerful search engines can be build, given a library of audiovisual material that has been analysed objectively and intelligently

TRANSCRIPT

Page 1: Fiat 20080921 results PISA

medialab

PISA PISA –– Proof Proof of Conceptof Concept

Production, Indexing and Search of Audiovisual MaterialProduction, Indexing and Search of Audiovisual Material

Page 2: Fiat 20080921 results PISA

2medialab

PISA - Positioning

! VRT-Medialab (medialab.vrt.be) - technical R&D

! IBBT (www.ibbt.be) – Interdisciplinary Research Institute

! PISA – Research Project on Production and Indexing of Audiovisual Media

! 21 Man-year

! Computer Assisted Manufacturing

! Unsupervised Feature Extraction

! Search Engine Technology

Page 3: Fiat 20080921 results PISA

3medialab

Context - Digital Media Production

Production Platform

Suprastructure – Metadata Mgnt

Production and distribution

Infrastructure - Networks and Storage

Production and distribution

Ingest

Media

Asset Mgnt

Editing

Playout

Mastering

Page 4: Fiat 20080921 results PISA

4medialab

Digital Asset Management, Content Management…

Production Platform

Suprastructure – Metadata Mgnt

Infrastructure - Networks and Storage

Production and distribution

Page 5: Fiat 20080921 results PISA

5medialab

User Expectations

Production Platform

Data General

Data General

Data General

Data General

Data General

Data General

MetaMeta

DataData

MetaMeta

DataData

Communication

(Information)

Suprastructure – Metadata Mgnt

Infrastructure - Networks and Storage

Production and distribution

Assumptions:

• An item is relevant or it is not

• A “scene” is the logical unit of search

The ideal search engine

• retrieves all relevant items (recall 100%)

• without false positives (precision 100%)

• enables instant access to digital media

• with respect to intellectual property.

Page 6: Fiat 20080921 results PISA

6medialab

Archiving – Disclosure, Annotation,…

archiefnummer : ALG 20010813 1

fragmentnummer : 1

reeks : 1000 ZONNEN EN GARNALEN

bandnummer : E03024404

formaat : DBCM

fragmenttitel : 1000 ZONNEN & GARNALEN

beeld : KL/PALPLUS

fragmentduur : 18 20

tekst : 0'00" TOERISTISCH REPORTAGEMAGAZINE OVERZICHT

ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,

OVERZICHT ONDERWERPEN

0'50" VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE

OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE

GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW

MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT

ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,

BEPANTING, FOTOALBUM MET VERLOOP WERKEN

4'00" JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN

WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,

RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN

UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF

7'50" DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM

INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER

trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND

CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO

SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;

PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;

VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;

LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;

BARBECUE; BETONMOLEN; IBM; RECLAMESPOT

rechthebbende : VRT

Opzoekscherm FILM Set: 16 Aantal: 1

blz 1 van 3

trefwoorden: ibm and vrt

archiefnummer: -

uitzendjaar: maand: dag:

fragmentnummer: fragmentduur:

reeks:

formaat: bandnummer:

aflevering: afleveringsnummer:

programma: uitzenddatum:

fragmenttitel:

tekst:

kategorie:

opnamedatum: opnamenummer:

journalist: rechthebbende:

SETS

The strings required for the operation are not defined

F11 F12 F13 F14 F17 F18 F19 F20 Ent

Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken

Page 7: Fiat 20080921 results PISA

7medialab

Aha - The Search Engine!

Page 8: Fiat 20080921 results PISA

8medialab

Issues – Catch-22

-> Automated processing of information is a key

discriminator, but it requires correct and

structured metadata

-> “Annotation” of rich media requires semantic

awareness and interpretation, and thus it is at

best an approximation

-> Product Engineering is the source of structured

and meaningful information, but creative staff

are not susceptible to technology

Page 9: Fiat 20080921 results PISA

9medialab

Objectives - Proof of Concept

• One Set of Numbers(!)

• Model Driven Development

• Computer Assisted Manufacturing

• Unsupervised Feature Extraction

• Efficient Search and Retrieval

Develop an extensible data-model and a consistent applicationDevelop an extensible data-model and a consistent application

framework, accessible via an intuitive user-interfaceframework, accessible via an intuitive user-interface

!

(! Digitizing analogue and disintegrated information flows)

Page 10: Fiat 20080921 results PISA

10medialab

Milestone 1 – Search Engine

Page 11: Fiat 20080921 results PISA

11medialab

Milestone 1 – Search Engine

Media Asset

Management System

(Ardome)

Search Engine

(Lucene/SOLR)

! Search federation by system integration

! Facetted search

! Integrated application of keywords

! Intuitive and structured presentation of results

! Direct access to audiovisual material

Search Client

(Custom Development)

Legacy Video Library

(Basisplus)

Actual news items

(Ardome)

Raw Material

(EBU Superpop)

<NewsML-G2>

Page 12: Fiat 20080921 results PISA

12medialab

Shot Segmentation and Scene Recognition

Page 13: Fiat 20080921 results PISA

13medialab

Character Recognition

Page 14: Fiat 20080921 results PISA

14medialab

Video copy detection

! Identify dupplicates

! Generation tracking

! Grouping of search results

! Intellectual Property Protection

Page 15: Fiat 20080921 results PISA

15medialab

Milestone 2 – Feature Extraction

Media Asset

Management

(Ardome)

! Time-coded properties and indexing allow

random access to material fragments:

! Shot segmentation and Keyframe extraction

! Subtitle processing and Speech recognition

! Taxonomy-driven topic detection

! Face recognition

! Scene recognition

! Copy detection

Shot

Segmentation

Speech

Recognition

Face

DetectionTopic

Detection

Media

Production

Media Asset

Management System

(Ardome)

Search Engine

(Lucene/SOLR)

Legacy Video Library

(Basisplus)

Actual news items

(Ardome)

Raw Material

(EBU Superpop)

<NewsML-G2>

Page 16: Fiat 20080921 results PISA

16medialab

Work in Process (due Q4 2008)

! Multi-lingual

! Access control and Intellectual Property Protection

! Audio segmentation and classification

! Music transcription

! Fractal-based visual indexing

! …

Media

Production

Page 17: Fiat 20080921 results PISA

17medialab

Conclusion

! Enterprise search – structured metadata, limited number of libraries, limited number

of records per library, dependencies between objects

! Intelligent search federation is aware of the media production process - scripts,

webpages, subtitles and formal annotation may represent the same editorial object

! Random access to audiovisual material requires an index is based on timecode and

not « wordposition in a document »

! Onthology-driven application logic is essential to create semantic awareness, i.e.

resolving synonyms and disambiguation of homonyms

! The perfect search engine is not for sale yet and required from the ground up design

and development.

Page 18: Fiat 20080921 results PISA

18medialab

Future Work - From « Metadata » to CAD/CAM

?

Page 19: Fiat 20080921 results PISA

19medialab

Future Work - From « Metadata » to CAD/CAM

?

Page 20: Fiat 20080921 results PISA

20medialab

! http://medialab.vrt.be/pisa

! http://projects.ibbt.be/pisa

! [email protected]