media manager mail access unified messaging barbara hohlt uc berkeley ericsson presentation august...
TRANSCRIPT
Media Manager Mail AccessUnified Messaging
Barbara Hohlt
UC BerkeleyEricsson Presentation August 22, 2000
Messages from many sources
PSTN Phone
Cell-Phone Desktop
Pager
MediaManager Mail Access
Project Overview
• Make messages more accessible– Get all types of messages– Access from different devices with different
capabilities– Enable faster browsing of many voicemails
• Media Mail services– A unified messaging infrastructure– Voicemail is email encoded in MIME
• Transcoding services– Enhance voicemail interaction– Includes: skimmed audio, transcript, text/audio
summary, and outline
Related Work
• Universal Inboxes/Unified Messaging– onebox.com– CoolMail.net– Lucent/Octel Unified Messenger– Stanford Mobile People Architecture
• Audio Content Extraction Techniques– SpeechSkimmer, MIT’s MultiMedia Lab [Arons95]– Auto-Summarization, Microsoft Research– CueVideo, IBM
Architecture
Transcoder Service•Voicemail->Text Transcript
•Voicemail->Text Summary
•Voicemail->Text Outline
•Email ->Plain Audio
•Email -. GSM Audio
•Voicemail -> GSM Summary
•Voicemail->Audio Summary
•Voicemail->Skimmed Audio
Mail Access Interface
NinjaMail
Client Folder Store
Client Client
Mail Access Interface
POP
Mail Access Interface
IMAP
Media Manager Interface
Media Manager Service
Applications
• Conventional GUIs• Context-Aware Applications• Iceberg Universal Inbox Component
Desktop
MediaManager Mail Access
A conventional desktop gui can contact the Media Manager directly and request messages as text.
The Media Manager will return emails and voicemails as text.
Context-Aware Application
Palm Device
Desktop
Redirection Proxy
Redirection Proxy
11palm device asks for a
list of messages as text and selects a voicemail
22
requests a redirection from the proxy, which forwards the redirection request to the
desktop
33
desktop asks for the voicemail and plays it
MediaManager Mail Access
Bhaskar’s Cell-Phone
Automatic Path Creation Service
800-MEDIA-MGR UID: [email protected]
Naming Service
11
Preference Registry
mediamgr: Cluster locn.22
33
Barbara’s PSTN Phone
Universal Inbox
Iceberg Universal Inbox
MediaManager Mail Access
Architecture
Transcoder Service•Voicemail->Text Transcript
•Voicemail->Text Summary
•Voicemail->Text Outline
•Email ->Plain Audio
•Email -. GSM Audio
•Voicemail -> GSM Summary
•Voicemail->Audio Summary
•Voicemail->Skimmed Audio
Mail Access Interface
NinjaMail
Client Folder Store
Client Client
Mail Access Interface
POP
Mail Access Interface
IMAP
Media Manager Interface
Media Manager Service
MediaManagerServiceIF• getFolders( ) and getFoldersAs( )
– Given a username, returns a list of folder names– Returns the list as audio or gsm
• getList( ) and getListAs( )– Given a username, foldername, and count– Returns a list of messages (sendername, title, date)– Returns the list as audio or gsm
• getMessage( )– Given a Message Ref, returns the entire message
• getMessageContent( )– Given a Content ID and return type– Returns one part of the message as the return type
• Media Message– Media Reference id– Array of Content Objects
• Content Object– Content ID– Data
• Content ID– Media Reference id– Content Part index– Content Type
Messages and Content Objects
Interface Example
MediaManager Mail Access
• User asks for list of messages as GSM• Media Manager returns a list of message
headers• Cell Phone sends a Content ID back• Media Manager sends a voicemail Content
Object
Cell-Phone
Media Message Header
Content Object
Content ID
Audio Tools• Speech Recognition/Synthesis
– Transcribe voicemail to text– IBM ViaVoice SDK and custom audio libs
• Natural Language Processing– Directed word spotting by “understanding”
content– ViaVoice SRCL
• Pitch – Detecting important words by emphasized pitch
• Pause– Compression through pause removal
• Spurts– Retrieve sentence structure of voicemail
Transcoding Techniques
Voice Mail -> Text Transcript Speech recognition
Voice Mail -> Text SummaryNLP, pitch detection and recognition
Voice Mail -> Text OutlinePause detection and speech recognition
E Mail -> Plain Audio Speech synthesis
E Mail -> GSM AudioSpeech synthesis and toast
Voice Mail -> Skimmed Audio Pause detection
Voice Mail -> Audio SummaryText summary and speech synthesis
Voice Mail -> GSM Summary Audio summary and toast
ExamplesOriginal Voicemail:
“Hello, This is Barbara. How are you and the cats doing? I was wondering if you would feed them a little more the first time in case they eat too much. My number is (713) 465-5155. You can call me anytime. Have a very good holiday. Bye bye”
Processed Voicemail:
• Phyllis Barbara• Area in the cat staring• And then if you run but feed them• A little more the first time in case
they eat too much• On my number is (713) 465-
5155• You can call me anytime.• Have every holiday• Of light
Translated Talk spurts
(Pitch emphasized words in green)
(Skimmed) (Just pitch)
Translated using NLP•Hello this is Barbara•My number is (713) 465-
5155
Examples continued...Original Voicemail:
“Faced with a seemingly inevitable engineering task authors tend to adopt one of two strategies for adding new services to the Internet landscape: inflexible, highly tuned, hand-constructed services….”
Processed Voicemail:Translated Talk spurts
(Pitch emphasized words in green)
(Skimmed) (Just pitch)
Translated using NLP
•<Nothing>
•Faced with a seemingly inevitable engineering task authors tend to adopt what it to strategies for adding new services to the internet landscape.
• Inflexible, highly Tate, had constructed services….”
Results
• Pause detection– Worked well for given applications– Playback speedup by 50-70%
• Pitch detection– Problems due to high pitch sounds and
transitions
• Speech recognition– Performance decrease in conversational
settings
• Natural Language Processing– Performed well with small grammar
Example: Adding GSM Acess
• Define a specific types, ie GSMAudio, GSMSummary
• Optionally create new Content Objects• Add Content Object definition to
MediaManager• Add add gsm transcoder to
TranscoderService
Detail: Adding GSM Access
• Add Content Object definition to MediaManager– Define GSMAUDIO and GSMSUMMARY– Add cases to createObject() in Content
Object– Add cases to Media Manager
• Add GSM to Transcodeer– Add method toGSM() to Transcoder– Edit .config file
• External.transcoder.gsm rungsm
– Edit related transcoders• speechSynthesizer and audioSummary()
Implementing Other Mail Stores
• Examples: IMAP, POP, Microsoft Exchange Server• Implement MailAccessIF
– String [] getMAFolders( userName )– MediaMessage [] getMAList( userName, folderName,
count )– MediaMessage getMAMessage( MediaRef )– ContentObject getMAMessageContent( ContentID )
• Add new protocol to Media Manager protocol table• Optionally add protocol for users in to FolderStore
Conclusion• Overall
– System useful as navigational hints– To achieve total comprehension, need better voice recognition
• What works well– Skimming using pause removal– Detecting spurts for structure
• What needs work– Speech detection in conversational settings– Pitch emphasis needs refining
• Future Directions– Implementing more mail stores– Enhancing interfaces– Pause detection/word boundaries using speech detection– Developing voicemail grammars– Using NLP feedback with pitch emphasis detection– Improved speech detection in noisy environments
MediaManagerServiceIF
• String[] getFolders( userName )• byte[][] getFoldersAs( userName, returnType ) • MediaMessage [] getList( userName,
folderName, count )• byte[][] getListAs( userName, folderName,
count, returnType )• MediaMessage getMessage( MediaRef ) • ContentObject getMessageContent( ContentID,
returnType )
Pitch Detection
• The Idea– A speaker’s pitch naturally changes when introducing
topics or emphasizing words [Hirshberg92]– Use pitch increases as hints for “important” words
• Algorithm [Aaron95]– Determine pitch for each 20 ms frame (FFT with SHS)– Set emphasis threshold to be top 1% of pitch values
(by histogram)– Mark 1 sec interval as emphasized if contains >=3
emphasized frames
Pause Detection• Why is pause detection useful?
– Removing pauses speedups playback • Typically, 50-70% of original time [Foulke71]
– Long pauses signify groups (talk spurts)
• Noise and soft sounds create difficulties• Algorithm: Smoothed Histogram
[Lamet81]– Calculate energy per 10 ms frame– Threshold based on smoothed histogram (5 dB after
first peak)– Use heuristics to remove artifacts
Average energy (dB)
Percent of
Frames
Results
• Pause detection– Worked well for given applications– Playback speedup by 50-70%
• Pitch detection– Problems due to high pitch sounds and
transitions
• Speech recognition– Performance decrease in conversational
settings
• Natural Language Processing– Performed well with small grammar
Conclusion• Overall
– System useful as navigational hints– To achieve total comprehension, need better voice recognition
• What works well– Skimming using pause removal– Detecting spurts for structure
• What needs work– Speech detection in conversational settings– Pitch emphasis needs refining
• Future Directions– Implementing more mail stores– Enhancing interfaces– Pause detection/word boundaries using speech detection– Developing voicemail grammars– Using NLP feedback with pitch emphasis detection– Improved speech detection in noisy environments
Works Cited
• [Arons95] B. Arons. Interactively Skimming Recorded Speech, Ph.D. dissertation, MIT 1985.
• [Foulke71] E. Foulke The Perception of Time Compressed Speech. Ch 4 in Perception of Language, edit by P.M. Kjeldergaaid, D.L. Horton, and J.J. Jenkins, Charles E. Merill Publishing Company, 1971. pp. 79-107
• [Hirshberg92] J. Hirschberg and B. Grosz. Intonational Features of Local and Global Discourse. In Proceedings of the Speech and Natural Language workshop (Harriman, NY, Feb. 23-26). Morgan Kaufman Publishers, 1992. pp. 441-446.
• [Lamel81] L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpson. An Improved Endpoint Detector for Isolated Word Recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-29, 4. (Aug, 1981), 771-785.
Architecture
Transcoder Service•Voicemail->Text Transcript
•Voicemail->Text Summary
•Voicemail->Text Outline
•Email ->Plain Audio
•Email -. GSM Audio
•Voicemail -> GSM Summary
•Voicemail->Audio Summary
•Voicemail->Skimmed Audio
Mail Access Interface
NinjaMail
Mail Access Interface
POP
Mail Access Interface
IMAP
Client
Client
Client
Folder Store
Media Manager Service
Media Manager Interface