groupe des ecoles des télécommunications lingtour

18
http://www.get- telecom.fr/ Groupe des Ecoles des Télécommunicatio ns LingTour

Upload: eric-thomas

Post on 26-Mar-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Groupe des Ecoles des Télécommunications  LingTour

http://www.get-telecom.fr/

Groupe des Ecoles des

Télécommunications

LingTour

Page 2: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Outline

Rationale for Lingtour Objectives Lingtour partners Technical developments Application architecture

Page 3: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Objectives: 3 scenarios

Accessing information: the Virtual Guide

Facilitating communication: the Communication Assistant

Finding local information: the Orientation Assistant

Page 4: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Rationale for Lingtour

A more user-friendly assistant Multimedia (text, speech, image,

video) Multimodal access (text, speech,

pen, visual I/O) Initially targeted for tourist

applications

Page 5: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Accessing information: the Virtual Guide Convenient and rapid way to access useful

information, locally or from a remote server Hotel / restaurant (location/style/pricing), Travel (possibilities/hours/fares), City transportation (routes/time/fares/traffic), Places to go / visit (location/hours/fees/route)

Multimodal Combining speech, text, map/image browsing

Interactive (dialogues, question refinement) Zoomable User Interfaces (ZUIs) + 2D Control menus Tap and talk Embodied Conversational Agents (ECAs)

Page 6: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Facilitating communication: the Communication Assistant Visual display to mediate the dialogue Translation assistant

browsable sets of questions / answers focused on useful situations : taxi, hotel, haggling over…

browsable lexicon to help communication for speech training thanks to the includes ASR and TTS

Access to a remote server / operator for difficult tasks Multimodal

Speech + text + sketching Interactive

2D Control menus Tap and talk ECA + TTS for speech and gestural training

Page 7: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

The Communication Assistant: modes of operation

Tourist-to-local communication, or Local-to-tourist communication

Speech / text / menu-selected input Menus for refinement / correction of ASR Translation Display and speech synthesis of translation

Pronunciation practice From lexicon or virtual guide items

Training modules Downloaded from a server situation-specific (hotel, restaurant, taxi…)

Page 8: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Finding local information: the Orientation Assistant Collecting input around the device to

Help localize the user interpret the environment

“intelligent camera” : ability to refine pictures integrated (Chinese) character recognition can also operate on characters sketched on the display

? localization facilities based on triangularisation and / or picture

interpretation possibility subject to the local network(s)

characteristics.

Page 9: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Lingtour partners TsingHua University

Pr. Mao Yuhang: translation from Chinese to French and English Pr. Ding Xiaoqing: Chinese OCR, intelligent camera Pr. Wang Zuo-yin: ASR

CLIPS Christian Boitet: translation Mutsuko Tomokiyo: Multimedia-UNL

Paris 8 University Catherine Pélachaud: ECAs

INT Yang Ni: image refinement Bernadette Dorizzi: HCI

ENST-Paris Gérard Chollet + Shiuan-Sung Lin: multilingual SR Eric Lecolinet: ZUIs and 2-D control menus Laurence Likforman: OCR Jacques Prado + Alain Goyé: PDA-server communications

ENST-Bretagne Yannis Haralambous + Andre Thepaut: OCR

Page 10: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Technical developments Chinese character recognition « Intelligent » Camera Text extraction Multilingual Speech Recognition Zoomable User Interfaces with 2-D

control menus « Cultural » Embedded

Conversational Agents

Page 11: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Chinese character recognition

Page 12: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Intelligent camera from TsingHua University

capturereco

translation

Page 13: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Extracting text from scene images

Complex color images Uncontrolled illumination Variations : size, fonts, orientation,

texture Complex backgrounds, shadows

Page 14: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Text extraction Searching for character regions (text has

uniform color) Multi-channel decomposition Connected components analysis Grouping of components Alignment analysis (number of horizontally or

vertically aligned components) Text identification (language independant features :

size, alignment,…)

Detection rate : 84 % False alarm rate : 5.6 %

Page 15: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Automatic Speech Recognition in Multiple Languages Sharing of acoustic models between languages

to simplify extensibility to other languages. Combination of phone models and adaptation

from small amounts of data in new languages. Model adaptation to user and environmental

situations.

French

ChineseSharedacousticmodels

Language specific models

Page 16: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Zoomable user interfaces with 2-D control menus 2-D control menus:

combine the selection and the control of an operation

integrate up to two scroll bars or spin-boxes

users keep their attention focused on the contents

can have sub-menus retain novice and expert

modes as marking menus

http://www.infres.enst.fr/net/zomit/cdi.html

Page 17: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Cultural Embedded Conversational Agents Behaviour adaptable to:

cultural and social context user (tourist, journalist)

various forms / complexity (2D, 3D, vector…) depending on device (PDA, Kiosk)

driven by a Representation Language based on XML-XSD standard (UNL type)

embedding the influence of a given culture, for example on:

choice of communicative gesture (smile vs head nod)

the duration of gaze…

Page 18: Groupe des Ecoles des Télécommunications  LingTour

Groupe des Ecoles des

Télécommunications The Lingtour Project

Application architecture

UMTS (?) server

Speech synthesis

Access information

Translation a word graph,+ a list of keywords