audient: an acoustic search engine by ted leath supervisor: prof. paul mc kevitt school of computing...
TRANSCRIPT
Audient: An Acoustic Search Engine
By Ted LeathSupervisor: Prof. Paul Mc Kevitt
School of Computing and Intelligent SystemsFaculty of Engineering
University of Ulster, Magee
Existing SDR Systems
• Involve the production of intermediate text for the purposes of indexing, searching and retrieval
• Require a high level of semantic processing for word recognition
• Have a limited vocabulary
• Have a high word recognition error rate
Phonemic and Phonogrammic Streams
Phonogrammic streams are orthographical representations of phonemic streams. This abstraction is ancient, and partially inherent in the English alphabet.
Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm
Project Goals
• Create a unique alternative to existing word-based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation
• Develop a speech-centric model which uses standards-based phonogrammic streams as primary internal data representation
• Allow both text and nonlexical phonemic audio queries of varying length
• Test against audio corpora used in the evaluation of other Information Retrieval (IR) systems
Previous Research/Systems
• TREC – The Informedia projects at Carnegie Mellon University– The Video Mail Retrieval and Multimedia Document
Retrieval projects at Cambridge University – The SCAN system at AT&T Research– The THISL project at Sheffield University
• SpeechBot and NPR Online – Public Internet Search Sites
• The National Gallery of the Spoken Word• BBN Rough ‘n’ Ready• Fast-Talk
Audient System Architecture
Audio Input
Phonemic Stream Abstraction/Construction
Phonetic and temporal abstraction
Query construction
Speech queries
Text Queries
Indexing
DatabaseQuery response
Phonogrammic and Temporal Information
Indexed Data
Non-speech Processing
Core Modules
Queries andTable Input
Phonemic Recognitionand Abstraction
Stream to Speech
Text to Stream
Create TranslationTable
Phonogrammic Streams,Location, Temporal
Information and Indexing
Text Query
Speech Query
DigitisedAudio
Stream
PhonogrammicStream
Phonogrammic Stream
DigitisedAudio
Streamand
Location
PhonogrammicStream
PhonogrammicMatch Request
PhonogrammicMatch Answer
SyntheticSpeech
Text TableComponent
PhonogrammaticTable
Component
Text
Converted Phonogrammic Stream
PhonogrammicQuery Result
TextTranslationInformation
AudioStreamReplay
Locationand
TemporalReference
Locationand
TemporalReference
DigitisedAudio
Streamand
Location
PhonogrammicStream,Locationand Temporal
Information
TextTranslationTable
Text forTranslation
PhongrammicTranslation
Proposed Tools
• The Hidden Markov Model Toolkit (HTK)
• Linux and C++
• Festival
• VoiceXML and the SGML Family
• The Apache Web Server
Project Schedule
ID Task Name Start End Duration2002 2003 2004 2005 2006 2007
Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1
1 262d01/08/200301/08/2002Literature Survey
2 175d19/02/200420/06/2003Write up literature review
3 133d18/12/200317/06/2003Selection, installation and integration oftools
4 66d18/03/200418/12/2003Construct Phonemic Recognition andAbstraction Module
5 66d17/06/200418/03/2004Construct Stream to Speech module
6 22d16/07/200417/06/2004Test and refine modules
7 67d18/10/200416/07/2004Construct Text to Stream module
8 23d17/11/200418/10/2004Test and refine modules
9 65d15/02/200517/11/2004Construct Queries and Table Inputmodule
10 67d18/05/200515/02/2005Construct Create Translation Tablemodule
11 67d18/08/200518/05/2005Construct Audio Stream Replay module
12 370d16/12/200519/07/2004Integrate and test core modules
13 152d17/03/200618/08/2005Test core modules against other IRsystems using corpora and optimise
14 70d22/06/200617/03/2006Populate index and demonstrate
15 90d25/10/200622/06/2006Incorporate search engine elements
16 250d29/05/200714/06/2006Finish thesis
Conclusion
• Create a unique alternative to existing word-based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation
• Develop a speech-centric model which uses standards-based phonogrammic streams as primary internal data representation
• Allow both text and nonlexical phonemic audio queries of varying length
• Test against audio corpora used in the evaluation of other Information Retrieval (IR) systems