audient: an acoustic search engine by ted leath supervisor: prof. paul mc kevitt school of computing...

15
Audient: An Acoustic Search Engine By Ted Leath Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

Upload: johnathan-stevenson

Post on 05-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Audient: An Acoustic Search Engine

By Ted LeathSupervisor: Prof. Paul Mc Kevitt

School of Computing and Intelligent SystemsFaculty of Engineering

University of Ulster, Magee

Food for Thought

Existing SDR Systems

• Involve the production of intermediate text for the purposes of indexing, searching and retrieval

• Require a high level of semantic processing for word recognition

• Have a limited vocabulary

• Have a high word recognition error rate

Things can be done differently!

Non-word Representations of Speech

• Could be features of the audio signal

• Could be phonemes

Phonemic and Phonogrammic Streams

Phonogrammic streams are orthographical representations of phonemic streams. This abstraction is ancient, and partially inherent in the English alphabet.

Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm

Project Goals

• Create a unique alternative to existing word-based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation

• Develop a speech-centric model which uses standards-based phonogrammic streams as primary internal data representation

• Allow both text and nonlexical phonemic audio queries of varying length

• Test against audio corpora used in the evaluation of other Information Retrieval (IR) systems

Previous Research/Systems

• TREC – The Informedia projects at Carnegie Mellon University– The Video Mail Retrieval and Multimedia Document

Retrieval projects at Cambridge University – The SCAN system at AT&T Research– The THISL project at Sheffield University

• SpeechBot and NPR Online – Public Internet Search Sites

• The National Gallery of the Spoken Word• BBN Rough ‘n’ Ready• Fast-Talk

SDR System Comparison Chart

Audient System Architecture

Audio Input

Phonemic Stream Abstraction/Construction

Phonetic and temporal abstraction

Query construction

Speech queries

Text Queries

Indexing

DatabaseQuery response

Phonogrammic and Temporal Information

Indexed Data

Non-speech Processing

Core Modules

Queries andTable Input

Phonemic Recognitionand Abstraction

Stream to Speech

Text to Stream

Create TranslationTable

Phonogrammic Streams,Location, Temporal

Information and Indexing

Text Query

Speech Query

DigitisedAudio

Stream

PhonogrammicStream

Phonogrammic Stream

DigitisedAudio

Streamand

Location

PhonogrammicStream

PhonogrammicMatch Request

PhonogrammicMatch Answer

SyntheticSpeech

Text TableComponent

PhonogrammaticTable

Component

Text

Converted Phonogrammic Stream

PhonogrammicQuery Result

TextTranslationInformation

AudioStreamReplay

Locationand

TemporalReference

Locationand

TemporalReference

DigitisedAudio

Streamand

Location

PhonogrammicStream,Locationand Temporal

Information

TextTranslationTable

Text forTranslation

PhongrammicTranslation

Proposed Tools

• The Hidden Markov Model Toolkit (HTK)

• Linux and C++

• Festival

• VoiceXML and the SGML Family

• The Apache Web Server

Project Schedule

ID Task Name Start End Duration2002 2003 2004 2005 2006 2007

Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1

1 262d01/08/200301/08/2002Literature Survey

2 175d19/02/200420/06/2003Write up literature review

3 133d18/12/200317/06/2003Selection, installation and integration oftools

4 66d18/03/200418/12/2003Construct Phonemic Recognition andAbstraction Module

5 66d17/06/200418/03/2004Construct Stream to Speech module

6 22d16/07/200417/06/2004Test and refine modules

7 67d18/10/200416/07/2004Construct Text to Stream module

8 23d17/11/200418/10/2004Test and refine modules

9 65d15/02/200517/11/2004Construct Queries and Table Inputmodule

10 67d18/05/200515/02/2005Construct Create Translation Tablemodule

11 67d18/08/200518/05/2005Construct Audio Stream Replay module

12 370d16/12/200519/07/2004Integrate and test core modules

13 152d17/03/200618/08/2005Test core modules against other IRsystems using corpora and optimise

14 70d22/06/200617/03/2006Populate index and demonstrate

15 90d25/10/200622/06/2006Incorporate search engine elements

16 250d29/05/200714/06/2006Finish thesis

Conclusion

• Create a unique alternative to existing word-based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation

• Develop a speech-centric model which uses standards-based phonogrammic streams as primary internal data representation

• Allow both text and nonlexical phonemic audio queries of varying length

• Test against audio corpora used in the evaluation of other Information Retrieval (IR) systems

Applications• Searching, indexing and retrieval of Internet

audio and video files• Searching, indexing and retrieval of broadcast

media• Services for the blind• Library services• Surveillance and intelligence gathering• Voice mail• Audio mining• Trend analysis (topic detection and tracking)