groupe des ecoles des télécommunications lingtour
TRANSCRIPT
http://www.get-telecom.fr/
Groupe des Ecoles des
Télécommunications
LingTour
Groupe des Ecoles des
Télécommunications The Lingtour Project
Outline
Rationale for Lingtour Objectives Lingtour partners Technical developments Application architecture
Groupe des Ecoles des
Télécommunications The Lingtour Project
Objectives: 3 scenarios
Accessing information: the Virtual Guide
Facilitating communication: the Communication Assistant
Finding local information: the Orientation Assistant
Groupe des Ecoles des
Télécommunications The Lingtour Project
Rationale for Lingtour
A more user-friendly assistant Multimedia (text, speech, image,
video) Multimodal access (text, speech,
pen, visual I/O) Initially targeted for tourist
applications
Groupe des Ecoles des
Télécommunications The Lingtour Project
Accessing information: the Virtual Guide Convenient and rapid way to access useful
information, locally or from a remote server Hotel / restaurant (location/style/pricing), Travel (possibilities/hours/fares), City transportation (routes/time/fares/traffic), Places to go / visit (location/hours/fees/route)
Multimodal Combining speech, text, map/image browsing
Interactive (dialogues, question refinement) Zoomable User Interfaces (ZUIs) + 2D Control menus Tap and talk Embodied Conversational Agents (ECAs)
Groupe des Ecoles des
Télécommunications The Lingtour Project
Facilitating communication: the Communication Assistant Visual display to mediate the dialogue Translation assistant
browsable sets of questions / answers focused on useful situations : taxi, hotel, haggling over…
browsable lexicon to help communication for speech training thanks to the includes ASR and TTS
Access to a remote server / operator for difficult tasks Multimodal
Speech + text + sketching Interactive
2D Control menus Tap and talk ECA + TTS for speech and gestural training
Groupe des Ecoles des
Télécommunications The Lingtour Project
The Communication Assistant: modes of operation
Tourist-to-local communication, or Local-to-tourist communication
Speech / text / menu-selected input Menus for refinement / correction of ASR Translation Display and speech synthesis of translation
Pronunciation practice From lexicon or virtual guide items
Training modules Downloaded from a server situation-specific (hotel, restaurant, taxi…)
Groupe des Ecoles des
Télécommunications The Lingtour Project
Finding local information: the Orientation Assistant Collecting input around the device to
Help localize the user interpret the environment
“intelligent camera” : ability to refine pictures integrated (Chinese) character recognition can also operate on characters sketched on the display
? localization facilities based on triangularisation and / or picture
interpretation possibility subject to the local network(s)
characteristics.
Groupe des Ecoles des
Télécommunications The Lingtour Project
Lingtour partners TsingHua University
Pr. Mao Yuhang: translation from Chinese to French and English Pr. Ding Xiaoqing: Chinese OCR, intelligent camera Pr. Wang Zuo-yin: ASR
CLIPS Christian Boitet: translation Mutsuko Tomokiyo: Multimedia-UNL
Paris 8 University Catherine Pélachaud: ECAs
INT Yang Ni: image refinement Bernadette Dorizzi: HCI
ENST-Paris Gérard Chollet + Shiuan-Sung Lin: multilingual SR Eric Lecolinet: ZUIs and 2-D control menus Laurence Likforman: OCR Jacques Prado + Alain Goyé: PDA-server communications
ENST-Bretagne Yannis Haralambous + Andre Thepaut: OCR
Groupe des Ecoles des
Télécommunications The Lingtour Project
Technical developments Chinese character recognition « Intelligent » Camera Text extraction Multilingual Speech Recognition Zoomable User Interfaces with 2-D
control menus « Cultural » Embedded
Conversational Agents
Groupe des Ecoles des
Télécommunications The Lingtour Project
Chinese character recognition
Groupe des Ecoles des
Télécommunications The Lingtour Project
Intelligent camera from TsingHua University
capturereco
translation
Groupe des Ecoles des
Télécommunications The Lingtour Project
Extracting text from scene images
Complex color images Uncontrolled illumination Variations : size, fonts, orientation,
texture Complex backgrounds, shadows
Groupe des Ecoles des
Télécommunications The Lingtour Project
Text extraction Searching for character regions (text has
uniform color) Multi-channel decomposition Connected components analysis Grouping of components Alignment analysis (number of horizontally or
vertically aligned components) Text identification (language independant features :
size, alignment,…)
Detection rate : 84 % False alarm rate : 5.6 %
Groupe des Ecoles des
Télécommunications The Lingtour Project
Automatic Speech Recognition in Multiple Languages Sharing of acoustic models between languages
to simplify extensibility to other languages. Combination of phone models and adaptation
from small amounts of data in new languages. Model adaptation to user and environmental
situations.
French
ChineseSharedacousticmodels
Language specific models
Groupe des Ecoles des
Télécommunications The Lingtour Project
Zoomable user interfaces with 2-D control menus 2-D control menus:
combine the selection and the control of an operation
integrate up to two scroll bars or spin-boxes
users keep their attention focused on the contents
can have sub-menus retain novice and expert
modes as marking menus
http://www.infres.enst.fr/net/zomit/cdi.html
Groupe des Ecoles des
Télécommunications The Lingtour Project
Cultural Embedded Conversational Agents Behaviour adaptable to:
cultural and social context user (tourist, journalist)
various forms / complexity (2D, 3D, vector…) depending on device (PDA, Kiosk)
driven by a Representation Language based on XML-XSD standard (UNL type)
embedding the influence of a given culture, for example on:
choice of communicative gesture (smile vs head nod)
the duration of gaze…
Groupe des Ecoles des
Télécommunications The Lingtour Project
Application architecture
UMTS (?) server
Speech synthesis
Access information
Translation a word graph,+ a list of keywords