languages & the media berlin, november 23rd 2012 davor orlic [email protected] knowledge for...

18
transLectures / machine translation 4 education Languages & the Media Berlin, November 23rd 2012 Davor Orlic [email protected] Knowledge for All Foundation Ltd

Upload: brenda-turner

Post on 18-Dec-2015

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

transLectures /machine translation 4

educationLanguages & the MediaBerlin, November 23rd 2012Davor [email protected] for All Foundation Ltd

Page 2: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

AGENDA

VideoLectures.NET Content, Statistics, Licenses, Partners Education, MOOCs, OpenCourseWare Consortium,

Opencast Matterhorn

The idea behind Who, What, Why, When?

transLectures Pillars, Current status, Results and Demo

Page 3: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

VIDEOLECTURES.NET

WHAT IS IT? VideoLectures.NET is the largest OER free and open access digital

library of academic talks. The lectures are given by distinguished scholars and scientists at conferences, summer schools, workshops.

WHAT IS THE CONTENT?

Content built up via European research projects based in Computer Science fields. Other content from OCW partners.

WHAT ARE THE STATS? 732 events, 10512 authors, 13726 lectures, 15965 videos Visits: 9,626,639 Page views: 26,011,939 Signed in users: 23560 Licenses: CC-NC-ND

Page 4: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

VIDEOLECTURES.NET STATS

Page 5: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

VIDEOLECTURES.NET

Page 6: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

THINK OF MOOCs

I enrolled in the MOOC “Intro to Databases” winter 2011 at Coursera 108,000 accounts 475,000 assignment submissions 3,150,000 video views (heavy use of video)

Wouldn't it be awesome if all such content and future options would be multilingual? Language personalisation for millions of

students Video, audio, papers, coursework - all

multilingual

Page 7: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

transLectures THE IDEA BEHIND WHAT WAS THE REASONING?

Huge set of HigherEd users (undergrads, MA, MSc, PhD) Huge collection of videos Videos are made of audio and video Audio and video are data Data can be harvested, changed and remixed

WHAT IF? We capture the audio Transform it into text

WHAT THEN? We can have subtitles, transcriptions, translations, personalisation,

contextualisation, descriptions, time alignment, fragmentation, recommendations, for 15965 academic talks

Page 8: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

STATE OF LANGUAGE TECHNOLOGY - MT

Same for:

Speech Processing, Text Analysis, Speech and Text Resources

Most of Europe's Languages are apparently unlikely to survive in the digital age. (META-NET white paper)

Page 9: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

transLectures PRE-TEXT

LEARNERS PREFER VIDEO? YouTube (78 hours per minute upload) MOOCs (3 mio accounts)

INITIATIVES AROUND VIDEO? Open content: OCW (20.000 courses) Massive lecture capture system: Opencast

Matterhorn project (700 Universities) Massive portals specialized in video lectures:

VLN, Polimedia (25.000 academic videos)

Page 10: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

transLectures CV

SPECS? Cost: 4,5 mio EUR Project ref no. ICT-287755 Project acronym: transLectures Project full title: Transcription and Translation of Video Lectures Instrument: ICT-2011.4.2 Language Technologies Thematic Priority: STREP Start date / duration: 01 November 2011 / 36 Months

WHO? Universidad Politecnica De Valencia, Xerox, Knowledge 4 All

Foundation Ltd., RWTH, European Media Laboratory Gmbh, Deluxe Digital Studios Ltd

OpenCast Matterhorn, VideoLectures.Net, Polmedia

Page 11: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

transLectures IN A NUTSHELL WHAT IS THE AIM?

To develop innovative, cost-effective solutions to produce accurate transcriptions and translations in VideoLectures, To deploy those tools across other Matterhorn-related repositories. For translation, we consider the language pairs: en⇆es, en⇆sl, enfr and ende.

WHAT IS THE IMPACT? A big step in making educational repositories truly accessible both to speakers of different languages and to people with disabilities.

ADDITIONAL VALUE? Imagine having 16000 lectures in most of the world`s languages.

Page 12: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

transLectures WHAT, WHY

KEYWORDS? language technologies, machine translation, automatic speech

recognition, massive adaptation, intelligent interaction, education, video lectures, multilingualism, accessibility

WHY TRANSCRIPTION & TRANSLATION? There are accessibility issues that can be solved by

transcription Non-native speakers understand better by reading than by

hearing At least 1,300 different languages with more than 100,000

native speakers No language with more than 20% of the world population

Page 13: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

transLectures STATUS

TRANSCRIPTION (EML) the complete transcription of English lectures took 45000

hours (2 months running parallel) TRANSLATION (XRCE, UPV, RWTH)

different segmentation strategies for transcription and translation being considered

INTELLIGENT INTERACTION WITH USERS experimental protocol to evaluate intelligent interactive

approaches for users INTEGRATION

first steps on integration software into VL, Polimedia, Matterhorn

EVALUATION human evaluations for the second round of evaluation

Page 15: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

CONCLUSION and FUTURE Technology is good enough for transcription & translation

We are going to develop open tools for transcription and translation

Deploy the tools in the Opencast Matterhorn system Think of a business plan and ideas on a spin-off Provide optimisations for existing languages

Ideally extend the language set to Chinese, Hindi and other

Is intelligent interaction a realistic concept? More focus on English into Slovenian translations to

improve them. Work on building a community of students for evaluation

Page 16: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

Thank you.

WEBSITES:

http://www.translectures.eu/ http://videolectures.net/http://polimedia.upv.es/catalogo/http://www.k4all.org/

Languages & the MediaBerlin, November 23rd 2012Davor [email protected] for All Foundation Ltd

Page 17: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

ADD. FEATURES

Accuracy estimation for each transcription and translation.

Adjustable computational behaviour. Output constrained to user

preferences and corrections. Fast learning from user corrections.

Page 18: Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

KNOWLEDGE FOR ALL (K4ALL) WHAT IS IT?

K4ALL is a Foundation based in London (2010) with the goal of providing the legacy of the PASCAL2 Network of Excellence (machine learning), part of this legacy is also the VideoLectures.NET website and strong connections in Opencast Foundation (creating the Matterhorn software) and Open Courseware Consortium.

WHAT DOES IT DO? I4All: Provision and distribution of infrastructure that supports the

K4A mission S4All: Online Science video journals and conference special issues E4All: Organization and access to educational material R4All: Research that facilitates the mission of K4All A4All: Ensuring accessibility for as wide an audience as possible