development of a toolkit handling sunao harajulius am lm am lm am lm qadb qadb qadb synthesized...

1
HTTP Request hts_engine Cache check Speech file - PCM - 16bit - 16kHz Speech recognizer N-best of ASR hypothesis Response generator Dialog response text Speech recognizer Speech recognizer N-best of ASR hypothesis N-best of ASR hypothesis Response generator Response generator Dialog response text Dialog response text Task Classifi- cation Text-to- speech Recorded speech log HTTP Response Speech file’s URL Dialog response text Dialog response text Synthesized speech log Julius AM LM AM LM AM LM QADB QADB QADB Synthesized speech file Device ID System=“Takemaru-kun” (jp… User=1D5C0439-81E2-44EF-B715-4… System> Hello. User> Hello. System> My name is Takema... User> Your name? System> The nearest station… User> Where is the nearest station? System=“Kita-chan” (jp.naist.spalab.Kitachan) User=1D5C0439-81E2-44EF-B715- 4CC0CFF0171A Kita-chan>Are you finding “Gakken-Kita-Ikoma” station? Takemaru-kun>The nearest station is “Gakken-Kita-Ikoma” station. Kita-chan Conclusion / Future works Sunao Hara Okayama University, JAPAN Hiromichi Kawanami Hiroshi Saruwatari Kiyohiro Shikano Nara Institute of Science and Technology, JAPAN Development of a toolkit handling multiple speech-oriented guidance agents for mobile applications Main Agent HTTP Response HTTP Request Speech file Dialog Management Sub Agent Dialog Management Dialog response Including task classifica- tion Including task classifica- tion Dialog response Delegated agent’s URIs HTTP Request Delegated agent’s URI HTTP Response Dialog response URI Device ID Device ID Sub-agent URIs Dialog response Selection of subagents Location specific service using GPS (or A-GPS) Experiments/Evaluations under real environment A Toolkit for developing a Spoken Dialog System (SDS) Server-and-Client architecture Each server can communicate with the other servers Aim of this study SDS as a “generalist” is constructed by a huge amount of SDSs as a “specialist” Specialist: Task-dependent dialog, location-dependent dialog, etc. Approach Single main agent (“generalist”) and multiple sub agents (“specialists”) Client system: iTakemaru (Takemaru-kun for iPhone) Web 1.0 Web 2.0 Web 3.0 Web 4.0? Information publishing (one-side) URI is mainly used as URL Hyper-linking: definition for implicitly relationships between resources Bookmark (Favorite): Tools for remembering the resource’s URI Information publishing (interactive) Concept of usability and sharing was being important Wiki-wiki, Web-API, Mash- up services, Information Tagging, etc. Information publishing (collaborative) Cloud: access to user’s data from any places Social: information sharing with others SNS, Cloud computing, Crowd sourcing, etc. Information explosion: rapidly increasing in the amount of published information; e.g. “Total Recall” Technology of Information Retrieval will be having more importance Librarian (Concierge) of each database or websites may be promising Lifelog, MyLifeBits (Microsoft), etc. Development of a toolkit handling multiple speech-oriented guidance agents for mobile applications Single main agent and multiple sub agents System In this study, we propose a toolkit to handle multiple speech-oriented guidance agents for mobile applications. The basic architecture of the toolkit is server-and-client architecture. We assumed the servers are located on a cloud-computing environment, and the clients are mobile phones, such as the iPhone. It is difficult to develop an omnipotent spoken dialog system, but it is easy to develop a spoken dialog agent that has limited but deep knowledge. If such limited agents could communicate with each other, a spoken dialog system with wide-ranging knowledge could be created. Now

Upload: others

Post on 28-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of a toolkit handling Sunao HaraJulius AM LM AM LM AM LM QADB QADB QADB Synthesized speech file Device ID System=“Takemaru-kun” (jp… User=1D5C0439-81E2-44EF-B715-4…

HTTP Request

hts_engine

Cache check

Speech file - PCM - 16bit - 16kHz

Speech recognizer

N-best of ASR

hypothesis

Response generator

Dialog response

text

Speech recognizer

Speech recognizer

N-best of ASR

hypothesis

N-best of ASR

hypothesis

Response generator

Response generator

Dialog response

text

Dialog response

text

Task Classifi- cation

Text-to-speech

Recorded speech log

HTTP Response

Speech file’s URL

Dialog response

text

Dialog response

text

Synthesized speech log

Julius

AM LM

AM LM

AM LM

QADB

QADB

QADB

Synthesized speech file

Device ID

System=“Takemaru-kun” (jp… User=1D5C0439-81E2-44EF-B715-4…

System> Hello. User> Hello.

System> My name is Takema... User> Your name?

System> The nearest station… User> Where is the nearest station?

System=“Kita-chan”

(jp.naist.spalab.Kitachan)

User=1D5C0439-81E2-44EF-B715-

4CC0CFF0171A

Kita-chan>Are you finding

“Gakken-Kita-Ikoma” station? Takemaru-kun>The nearest station is

“Gakken-Kita-Ikoma” station.

Kita-chan

Conclusion / Future works

Sunao Hara Okayama University, JAPAN

Hiromichi Kawanami Hiroshi Saruwatari Kiyohiro Shikano Nara Institute of Science and Technology, JAPAN

Development of a toolkit handling

multiple speech-oriented guidance

agents for mobile applications

Main Agent

HTTP Response

HTTP Request

Speech file Dialog Management

Sub Agent

Dialog Management

Dialog response

Including task

classifica-tion

Including task

classifica-tion

Dialog response

Delegated agent’s URIs

HTTP Request

Delegated agent’s URI

HTTP Response

Dialog response

URI

Device ID

Device ID

Sub-agent URIs

Dialog response

Selection of subagents Location specific service using GPS (or A-GPS) Experiments/Evaluations under real environment

A Toolkit for developing a Spoken Dialog System (SDS)

Server-and-Client architecture Each server can communicate with the other servers

Aim of this study SDS as a “generalist” is constructed by a huge amount of SDSs as a “specialist” Specialist: Task-dependent dialog, location-dependent dialog, etc.

Approach Single main agent (“generalist”) and multiple sub agents (“specialists”) Client system: iTakemaru (Takemaru-kun for iPhone)

Web 1.0 Web 2.0 Web 3.0 Web 4.0? Information publishing

(one-side) URI is mainly used as URL Hyper-linking: definition for

implicitly relationships between resources

Bookmark (Favorite): Tools for remembering the resource’s URI

Information publishing (interactive)

Concept of usability and sharing was being important

Wiki-wiki, Web-API, Mash-up services, Information Tagging, etc.

Information publishing (collaborative)

Cloud: access to user’s data from any places

Social: information sharing with others

SNS, Cloud computing, Crowd sourcing, etc.

Information explosion: rapidly increasing in the amount of published information; e.g. “Total Recall”

Technology of Information Retrieval will be having more importance

Librarian (Concierge) of each database or websites may be promising

Lifelog, MyLifeBits (Microsoft), etc.

Development of a toolkit handling multiple speech-oriented guidance agents for mobile applications

Single main agent and multiple sub agents

System

In this study, we propose a toolkit to handle multiple speech-oriented guidance agents for mobile applications. The basic architecture of the toolkit is server-and-client architecture. We assumed the servers are located on a cloud-computing environment, and the clients are mobile phones, such as the iPhone. It is difficult to develop an omnipotent spoken dialog system, but it is easy to develop a spoken dialog agent that has limited but deep knowledge. If such limited agents could communicate with each other, a spoken dialog system with wide-ranging knowledge could be created.

Now