12323133131313131.design_cotxe_v1.4

Upload: cypruschristianuspotilu

Post on 13-Oct-2015

14 views

Category:

Documents


0 download

DESCRIPTION

code about CATALAN DATABASE FOR IN-CAR APPLICATIONS

TRANSCRIPT

Design.doc

Catalan Database for In-Car Applications

CATALAN DATABASE FOR IN-CAR APPLICATIONS

Author(s):Asuncin Moreno, David Conejero, Gonzalo Bustamante

Institute:Universidad Politcnica de Catalua

Address:Jordi Girona 1-3, Edificio D5, 08034 Barcelona, Spain

email:[email protected]

Date:December, 23rd 2006

Version:V1.4

CONTENTS

41.Introduction

1.1Speech file formats51.2Directory structure61.3File nomenclature71.4Label files92.Database design and collection122.1Recording platform122.2Speaker recruitment132.3Design of prompting and prompt-sheet133.Database contents definition133.1Application words133.1.1Common application words 00-81133.1.2Language-dependent application words P1-2203.2Voice activation keywords A1-2213.3Isolated digits213.3.1Single digits I1-4213.3.2Digit string B1213.4Connected digits213.4.1Sheet number C1213.4.2Telephone number C2, C5-C7223.4.3Credit card number C3223.4.4PIN code C4243.5Dates D1-3253.5.1Spontaneous date253.5.2Prompted date253.5.3Relative and general date expression253.6Embedded application word phrases E1-2263.7Spelled names/words L1-7263.7.1Spontaneous name273.7.2Prompted name linked to city273.7.3Real names/words273.7.4Artificial name273.8Money amount M1273.9Natural number N1273.10Directory assistance names O1-7273.10.1Spontaneous forename273.10.2Spontaneous city name283.10.3City name (set of 150)283.10.4Company/agency name/street name (set of 150)283.10.5Forename & surname (set of 150)303.11Phonetically rich sentences S1-9313.12Times T1-2323.12.1Spontaneous time323.12.2Read time phrase323.13Phonetically rich words W1-4333.14Spontaneous sentences Z0-9343.15Any other additional material363.16Links to other databases364.Transcription365.The lexicon386.Speaker demographic information406.1Accent/Regions406.2Speaker characteristics417.Recording conditions418.Deviations from SpeechDat Car specifications429.Sample Prompt sheets429.1sample instruction sheets and prompt sheet4210.BIBliography44

1. Introduction

The Catalan Database for In-Car Applications was recorded within the scope of the Generaci de recursos lingstics per les technologies de la parla project which was sponsored by the Catalan and Spanish Governments.Collection was performed at the Department of Signal Theory and Communications of the Universitat Politcnica de Catalunya (UPC) (Spain) and annotation was performed at Verbio Technologies. The owner of the database is the Catalan Government.This database comprises in-car recordings from 300 speakers recorded in 600 different sessions. The database follows the SpeechDat Car specifications (corpus content, speakers, transcription, lexicon, formats) and the Speecon specifications for the recording platform (speech signal formats and doc files). The database is distributed in 12 ISO 9660 DVD volumes and one CD ROM. The CD is used for text files and documentation, DVDs content recordings in the car. The content of each volume is described below. Tables show the disk identification name, the first and last codes of the sessions included in each disk and the effective number of sessions.

DiskDISK_IDFromToSesContents

CD01VEHIC2CAD00Text and

documentation

DVD00VEHIC2CA000BLOCK00/SES0000BLOCK00/SES004950Signals

DVD01VEHIC2CA001BLOCK00/SES0050BLOCK00/SES009950Signals

DVD02VEHIC2CA002BLOCK01/SES0100BLOCK01/SES014950Signals

DVD03VEHIC2CA003BLOCK01/SES0150BLOCK01/SES019950Signals

DVD04VEHIC2CA004BLOCK02/SES0200BLOCK02/SES024950Signals

DVD05VEHIC2CA005BLOCK02/SES0250BLOCK02/SES029950Signals

DVD06VEHIC2CA006BLOCK03/SES0300BLOCK03/SES034950Signals

DVD07VEHIC2CA007BLOCK03/SES0350BLOCK03/SES039950Signals

DVD08VEHIC2CA008BLOCK04/SES0400BLOCK04/SES044950Signals

DVD09VEHIC2CA009BLOCK04/SES0450BLOCK04/SES049950Signals

DVD10VEHIC2CA010BLOCK05/SES0500BLOCK05/SES054950Signals

DVD11VEHIC2CA011BLOCK05/SES0550BLOCK05/SES059950Signals

The list of the distribution disks and directories are contained in the README.TXT file. Further details regarding the database contents, files and directories are provided in the documentation files in the DOC directory and the files in the TABLE and INDEX directories.

File types are identified with the following extensions:

*.DOC

- Microsoft Word V6.0 document

*.LST

- DOS text index file with ISO Latin 1 symbols

*.TBL

- DOS text file with ISO Latin 1 symbols

*.SES

- DOS text file

*.TXT

- DOS text file

*.CAC

- SAM label file, text file with ISO Latin 1 symbols for car recordings

*.CA1

- Speech signal channel 1

*.CA2

- Speech signal channel 2

*.CA3

- Speech signal channel 3

*.CA4

- Speech signal channel 4*.PS

- Postcript file

Each CD-ROM has the following directory structure:

\:

COPYRIGH.TXT

- copyright notice

DISK.ID

- UNIX volume ID file

README.TXT

- readme file

VEHIC2CA\

- data directory

VEHIC2CA\DOC:

DESIGN.DOC- Catalan database documentation file

SUMMAR0.TXT- database contents summary file

SAMPALEX.PS- SAMPA table

ISO88591.PS

- ISO 8859_1 table

VEHIC2CA\INDEX:

CONTENT0.LST- file/utterance/speaker index table

VEHIC2CA\TABLE:

LEXICON.TBL- full lexicon table

REC_COND.TBL- Recording condition table

SESSION.TBL- session table

SPEAKER.TBL- speaker table

VEHIC2CA\:

- contains the data block directories

BLOCK00\- sessions are grouped in blocks

BLOCK01\

...

VEHIC2CA\BLOCK00:

SES0002\ - session directories for each session

...

VEHIC2CA\BLOCK00\SES0002:

V2000206.CAC- SAM label file for car recordings

V2000206.CA0- speech signal file in carV2000206.CA1- speech signal file in carV2000206.CA2- speech signal file in carV2000206.CA3- speech signal file in car

1.1 Speech file formats

Four high quality audio channels are recorded in a car in a mobile platform Plt_M and are stored as sequences of 16bit, 16 kHz uncompressed.Each prompted utterance is stored within a separate file. Each speech file has an accompanying ASCII SAM label file

1.2 Directory structure

The directory structure uses a shallow directory nesting with contiguous numbers to identify the individual sub-directories and call directories. The following threelevels directory structure is defined:

\\\

Where:

Defined as: i.e. VEHIC2CAWhere:

is VEHIC

is 2 for this project is the ISO 2letters code CA for Catalan

Defined as: BLOCK

where is a progressive number from 00 to max. 99. These numbers are the same as the first 2 digits used in described below.

Defined as: SES

Where is a progressive number in the range 0000 to max. 9999, being the numeric call identification number also encoded in each filename.

Table 1- SpeechDat Car directory structure

Both signal files and label files are put in the same directory.

In addition to the previous structure the following directories are used to store some other files:

\\DOCdocumentation files

\\TABLEspeaker, recording conditions, session and lexicon tables

\\INDEXindex files

Table 2- Non-speech related directory structure

All sessions have complete recordings for all prompted items. Exceptions can be found in the summary text files.

Finally the root directory contains three files:

a README.TXT ASCII file describing all files in the database, per disk; signal and label files are reported by specifying their templates;

a DISK.ID ASCII file containing the volume name (11 characters long); it supplies the volume label to UNIX systems that are unable to read the physical volume label, e.g. VEHIC2CA001

a COPYRIGH.TXT ASCII file to protect the authors rights.

All these support files, except README.TXT file are duplicated in each database disk.

1.3 File nomenclature

File names follow the ISO 9660 file name conventions (8plus3 characters) according to the main CDROM standard. The following template is used:

DD NNNN CC. LL F

where:

DDDatabase identification code (00-ZZ)

For this project: V2

NNNNRecording session progressive number (0000-9999); see section 1.2

CCCorpus code (00-81, A1-Z9) obtained by collating the corpus and the item identifiers

LLTwo letter ISO 639 language code

FFile type code

n=Channel n in-vehicle recording, n = 0,1, 2, 3C=label file for car recording

Table 3- SpeechDat Car filename convention

As it is useful for users to clearly identify the speech file contents by looking at the filename we have specified two-character corpus by the following table. All items are read, unless marked as spontaneous.

Corpus identifierItem identifierCorpus contents

A1,22 voice activation keywords

B11 sequence of 10 isolated digits

C11 sheet number (4+ digits)

C21 spontaneous telephone number (9-11 digits)7 connected digits

C5,6,73 read telephone numbers

C31 credit card number (16 digits)

C41 PIN code (6 digits)

D11 spontaneous date, e.g. birthday

D21 prompted date, word style3 dates

D31 relative and general date exp.

E1-22 word spotting phrases using an application word (embedded)

I1-44 isolated digits

L11 spontaneous, e.g. own forename

L21 spelling of direct. city name7 spelled word

L3,4,5,64 real word/name(letter sequences)

L71 artificial name for coverage

M11 money amount

N11 natural number

O11 spontaneous, e.g. own forename

O21 city of birth / growing up (spontaneous)

O3,42 most frequent cities 7 directory assistance

O5,62 most frequent company/ agency/ street names

O71 forename/surname

S1-99 phonetically rich sentences

T11 time of day (spontaneous)2 time phrases

T21 time phrase (word style)

W1-44 phonetically rich words

00-1213 Mobile phone Application words

20-4122 IVR functions keywords67 application words

50-8132 car products keywords

P1,22 additional language dependent keywords

Z0-9Prompts for spontaneous speech

Table 4 Corpus codes and contents.

The proposed format uses mnemonic values. It permits selection of all files belonging to one of the sixteen corpora by using one command (e.g. in DOS dir /s/b ??????C*, in UNIX find. name??????C*print).

A list of separate documentation files, tables and listings follows below:

DirectoryFile

TABLELEXICON.TBLlexicon

REC_COND.TBLrecording condition table

SESSION.TBLsession table

SPEAKER.TBLspeaker table

INDEXCONTENT0.LSTcontents list (for close talk mic)

DOCDESIGN.DOC(this) main documentation file

ISO88591.PSISO 8859 character set

SAMPALEX.PSSAMPA phone symbols used in lexicon

VALREP.TXTValidation report

SUMMAR0.TXTsummary file (for close talk mic)

Table 5- Documentation files, tables and listing

1.4 Label files

Label files adhere to a modified SAM label format:

ABC: item1, item2, item3,

where

ABC is a three letter mnemonic followed by a colon; the mnemonic must contain only 7-bit US-ASCII character and may not contain spaces or colons

items after the mnemonic are separated by commas, i.e. they cannot contain commas themselves

items can be empty

spaces after the colon or in between items are recommended to improve readability

a label line is delimited by , the line end sequence according to the DOS operating system.

Note: the 80 character limit on line length is no longer enforced in SpeechDat-Car. Consequently, the version number in the LHD field has been updated.

Table 6 shows the SAM labels used in this database. Optional items are enclosed in {}.

SAM LabelDescriptionFormatFormat string

LHDlabel headerfixed vocabulary item%s

ELFend of label file

CMTcommentfree-form text%s

DBNdatabase nameSpeechDat_Car_%s

SESsession number4-digit number%04d

REGcalling regionfixed vocabulary item from list of regions%s

SCDspeaker coden-digit number%0d

SEXspeaker genderfixed vocabulary item: {M|F}%s

AGEspeaker ageinteger%d

ACCspeaker accentfixed vocabulary item from list of dialects%s

DIRspeech file directoryfixed vocabulary item from file system\BLOCK\SES\%s

SRCspeech file name8.3 file name%8c.3c

CCDcorpus code2 character code%2c

REPrecording placefree-form text%s

REDrecording dateDD/Mon/YYYY%02d/%3c/%4d

RETrecording timeHH:MM:SS%02d:%02d:%02d

BEGlabelled sequence begin positioninteger%d

ENDlabelled sequence end positioninteger: number of sample points in recording%d

SAMsampling frequencyinteger: {8000|16000}%d

SNBnumber of (8-bit) bytes per sampleinteger: {1|2}, {signed|unsigned}%1d,%s

SBFsample byte orderinteger: {0|lohi}%s

SSBnumber of significant bits per sampleinteger: {8|16}%d

QNTquantizationfixed vocabulary item, e.g.: {ALAW | RAW | PCM}%s

NCHnumber of channels4%d

LBDlabel file body

LBRprompt textBEG,END,,,,

with , , optional signal values; if they are not known, the values may be left empty, but the correct number of commas must remain. is the ISO 8859-1 encoded text that appears on the screen.%d, %d, %d, %d, %d, %s

CARcar make and typefree-form text%s

SPPspeaker positionfixed vocabulary item:

{DRIVER | CO_DRIVER}%s

EXNexperimenter namefree-form text%s

SCCscenario codefixed vocabulary item

{HIGHWAY | CITY | }%s

WTCweather conditionfixed vocabulary item

{SUN | RAIN | }%s

CEQcar equipmentattribute value pair list, e.g.

CLIMATE=ON, WINDOW_L_FRONT=OPEN,

only binary values are allowed, and all attributes must be present%s

MIPmicrophone positionattribute value pair list

CH0=CLOSE_TALK, CH1=CLOSE, CH2=CENTER, CH3=A_COLUMN

%s

MITmicrophone typeattribute value pair list

CH0=SHURE, CH1=LAVALIER, CH2=PEIKER, CH3=AKG%s

LBOorthographic transcriptionBEG, (END-BEG)/2, END,

with the appropriate SpeechDat-Car compliant ISO 8859 annotation text.%d, %d, %d, %s

LB{0|1|2|3}orthographic transcriptionBEG, (END-BEG)/2, END,

with the appropriate SpeechDat-Car compliant ISO 8859 annotation text.%d, %d, %d, %s

Table 6- SpeechDat-Car SAM labels

Example of a .cac label file

LHD: SAM,6.0

DBN: SpeechDat_Car_CASES: 2381

CMT: *** Speech Label Information ***

SRC: V22381A1.CA1DIR: \VEHIC2CA\BLOCK23\SES2381

CCD: A1

BEG: 0

END: 78399

SYN: 2724

REP: Barcelona

RED: 27/Mar/2006RET: 12:33:57

EXP: Sergio OllerCMT: *** Speech Data Coding ***

SAM: 16000

SNB: 2,unsigned

SBF: lohi

SSB: 16

QNT: RAW

NCH: 4

CMT: *** Speakers Information ***

SCD: 238

SEX: M

AGE: 45

ACC: EAST

CMT: *** Recording conditions ***

CEQ: CLIMCONTROL=ON,AUDIO=ON,WINDOW_L_FRONT=CLOSE,WINDOW_R_FRONT=CLOSE,WINDOW_REAR=CLOSE,ROOF=CLOSE,WIPERS=OFF,CROSS_TALK=NO

WTC: SUN

REG: EAST

CAR: SEAT Alhambra

MIP: CHN0=CLOSE_TALK,CHN1=CLOSE,CHN2=CENTER,CHN3=A_COLUMNMIT: CHN0=SHURE,CHN1=LAVALIER,CHN2=AKG,CHN3=PEIKERSPP: CO_DRIVER

EXN: Sergio OllerSCC: HIGH_SPEED_GOOD_ROAD

CMT: *** Label File Body ***

LBD:

LBR: 0,78399,,,,Finalizar la llamada

LB0: 0,39199,78399,[sta][int] finalizar la llamada

LB1: 0,39199,78399,

LB2: 0,39199,78399,

LB3: 0,39199,78399,

ELF:

2. Database design and collection

2.1 Recording platform

The recording platform is a mobile recording platform (PltM) installed inside the car, recording multi-channel speech utterances in a high bandwidth mode (60-7000 Hz, 16 kHz sample frequency).

Multi-channel recordings are performed in the car. The recordings are made through an Acoustic front-end (AFE) installed inside the car and connected to the recording platform PltM.. Three kinds of AFEs are used simultaneously during the recordings: a close-talk microphone, a Lavalier microphone and a remote noise cancelling microphone with 2 Handsfree microphones placed at different locations in the car

The mobile recording platform in the car (PltM) uses a PC to drive the recording process. Data acquisition is performed by a dedicated hardware in the PC and the storage is made directly on hard disk. The recordings are always made on four channels (1 close-talk signal as reference, one close signal and 2 far-talk signals). The positions for the far-talk microphones are:

A_Column: at the ceiling of the car near the A-pillar

Center: at the ceiling of the car over the mid-console (near the rear mirror)A flat panel TFT colour-display for in-vehicle use is attached to the windscreen or the dashboard of the car.The data acquisition board installed in the Car-PC is a combination of two plug-in boards:

Multifunction data acquisition board

Anti-aliasing filter board Multi-channel board recording API

User Interface (MMI)

Prompt file management

2.2 Speaker recruitment

Speakers were recruited from several Universities in Catalonia (students and their relatives) and from some associations. This method has access to a large quantity of people of several dialectal areas, sex and ages. In most of the cases, each speaker recorded two sessions consecutively.

2.3 Design of prompting and prompt-sheet

600 different prompt sheets were generated. Phonetically rich sentences are read at the beginning of the session. This was recommended by the specifications. Other items are spread over the prompt sheets to avoid list effect.

3. Database contents definition

Table 4 shows the contents of the Catalan Database. The final specification for the Catalan recordings is as follows:

3.1 Application words

3.1.1 Common application words 00-81

Corpus codes 00-81 contain application words. Each speaker pronounces 82 applications words from a set of 200. The 200 Catalan application words were translated from a set of 200 English descriptions provided by the SpeechDat car consortium. Translations were done by the UPC Servei de Llenges i Terminologa .

The following Tables show the English Description, English example and Catalan words as provided by the professional translators.Mobile Phone Application WordsGSM telephoneEnglish exampleCatalan word

Select telephone (and display menu) Telephone / Mobile / GSMMbil/Telfon/GSM

Dial number (select number dialing)Dial, number-dialingMarcar

Redial last number or last nameRedial Remarcar

Place a call with a name Call Trucar

Access to call names (phone numbers of regularly called people) Phonebook Agenda

Begin dialingDial Marcar

Accept incoming callAccept call Acceptar trucada

Enter telephone number (digit by digit)Phone number Nmero de telfon

Store number in memoryStoreMemoritzar

Home phoneHomeCasa

Hang up the telephone / End the callHang up /EndPenjar

Office callOfficeOficina

Refuse incoming callRefuse a callRebutjar trucada

Keep on redialing automatically (call back)Automatic redial / Call back Remarcaci automtica/Remarcaci

Secretary callSecretarySecretria/Secretari

Emergency callEmergency / MaydayEmergncia/SOS

Forward a callForward call Remetre trucada

Transfer to human operatorOperator Operadora

Private numbers listPrivate Privada

Select info functionsInfo Informaci

Enter prefix of the telephone number (digit by digit)Prefix Prefix

Professional number listBusinessEmpresa

Select setting functionsSettings Opcions

Select access functionsAccess Accs

Choose a name out of the last X callsChoose Seleccionar

Access to code for changingCode Codi

Date of deliveryScheduleHorari

Answering to the incoming call with a greetingGreeting / Waiting messageMissatge d'espera

Play or present dial history listPrevious numbers / Last callsltimes trucades/Nmeros anteriors

Getting a list of missed callsMissed callsTrucades perdudes

Use of DTMFDTMFTons

Switch to hands-free (muting headset)Hands-freeMans lliures

Mute modeMuteSilencis

Getting a list of received callsReceived callsTrucades rebudes

Getting the time length of the callTime Durada

Look for information on ACRONYM (SD trained name)Look upBuscar

Change/Send the current user profileProfile Perfil

Make a conference phone callConference Conferncia

Put on holdHoldEn espera

IVR Funcions KeywordsIVR functionsEnglish exampleCatalan word

Cancel the current operationCancel / UndoCancellar

Stop current functionStop Parar

Answer to prompt with yesYes S

Answer prompt with noNo No

Request information or menu options or helpHelp Ajuda

Abort and go to main menu of selected sourceAbort / Exit / EscapeAnullar/Sortir

Repeat last function or commandRepeat Repetir

Confirm (accept top or marked candidate from item list)O.K. Acceptar

Go back one item in the list, or replay previous messagePrevious / BackAnterior

Go forward one item in the list, or play next messageNext Segent

Delete an entry, message or list itemDelete Esborrar

Send a messageSend Enviar

Select an option / functionOption / Function Opci/Funci

Quit the applicationQuit Sortir

Save/archive current entry, message or list itemSave Memoritzar

Return to main menuMenu / Main menuMen/Men principal

Correct last entryCorrect Corregir

Continue a stop operation or proceed with next itemContinue Continuar

Select a name, message or optionSelect Seleccionar

Add or insert an entry (user-defined name and number)Add / Create /NewAfegir/Crear/Nou

Record a message or a voicemail greeting or an information fileRecord Gravar

Go one menu level upUp a menu Men superior

Go to the end of the list or voicemail messagesEnd / LastFinal de la llista/ Fi

Go to the beginning of the list or voicemail messagesStart / First / TopPrincipi/ Inici

Play out message or information filePlay / Listen toReproduir

Play againReplayRepetir

Select language of service (native language of database)English Catal

Modify an entryChange / ReviewCanviar/Revisar

Enter spelling mode (any alpha numeric input)SpellLletrejar

Go to next pageNext page / Page upPgina segent

List directory of names, messages or programming optionsListLlista

Enter directory sub-menu or list directory entriesDirectory Directori

Go to previous pagePrevious page / Page downPgina anterior

Pause the current operationPause Pausa

Connect to ConnectConnectar

Activate a functionActivateActivar

Reset idle modeResetRestablir

Menu listMenu listMen

Program advanced features or options, or enter a sub-menu Program Programar

Go one menu level downDown a menu Men inferior

Deactivate a functionDeactivateDesactivar

Select automatic modeAutomatic Automtic

Select manual modeManual Manual

Select languagesLanguage Idioma

Specific message service functionsEnglish exampleCatalan word

Enter voicemail menu programVoicemail Bstia de veu

Enter email / mailbox menu programEmail Correu electrnic

Read a mail / messageRead Llegir

Answer to a messageAnswer Respondre

Enter agenda programAgenda / Address bookAgenda

Enter Internet programInternetInternet

Receive new e-mails / messagesGet mail Obtenir correu

Enter a password to access voicemail functionsPassword Contrasenya

Enter Short Messages ServiceSMSSMS

Enter fax menu programFax Fax

Transfer/Forward messageTransfer / ForwardReenviar/Transferir

Dictate text messageDictate Dictar

Go to a siteGo toAnar a

Ask for information on mail headerMail headerEncapalament

Declare message as urgentUrgent Urgent

Multimedia Service CommandsEnglish exampleCatalan word

select a telematic serviceTelematics serviceServei telemtic

search for telematic servicesShow servicesLlista de serveis

Request departure timesDepartures Sortides

Request parking information (availability)Parking (information)Aparcaments

International InternationalInternacional

Send credit card number / Payment with a credit cardCredit cardTargeta de crdit

Car Products KeywordsCar radio English exampleCatalan word

Select tuner (and play last station)Radio Rdio

Select cassette (and play last track)Tape Casset

Resume playing actual sidePlay Reproduir

Select CD-changer (and play last CD)CD changer / CD playerCarregador de CD/CD

Quit playing, skipping or rewindingStop Parar

Select CD-changer and specific CD by name or numberCD CD-NAME / CD-NUMBERCD nmero

Select particular track on actual CD Track TRACK-NUMBERPista

Select pause modePause Pausa

Play next track on actual cassette/CDNext track Pista segent

Play previous track on actual cassette/CDPrevious track Pista anterior

Scan available radio stationsScan Buscar emissores

Directly select tuner and specific radio stationPlay + ACRONYMPosar

Enter volume controlVolume Volum

Reverse side of the cassetteTurn over / Reverse sideGirar/Canviar de cara

Scan CDScan CDExplorar CD

Select random / shuffle modeRandom playSelecci aleatria

Ask for stored traffic information messages (TIM)Traffic messagesMissatges de trnsit

Fast forwardFast forwardAvanar

RewindRewind Enrere

Select FM BandwidthFM radioFM

Eject cassetteEject Expulsar

Enter bass controlBass Greus

Enter treble controlTreble Aguts

Select DAB (digital audio broadcasting) BandwidthD.A.B.D.A.B.

Navigation SystemEnglish exampleCatalan word

Select navigation (and display menu)Navigation Navegaci

Repeat last acoustic messageAgain / Repeat / last messageRepetir/ltim missatge

Enter destinationEnter destinationIntroduir destinaci

Directly guide to pre-stored destinationGuide to + DESTINATIONGuia a

Enter a town nameTown / CityCiutat/Poble

Enter a Street nameStreet Carrer

Enter AirportAirport Aeroport

Enter City centerCenter Centre

Enter Gas / Petrol stationGas stationBenzinera

Enter Car service (garage)Garage Taller mecnic

Select guidance (and display menu)Guidance Guiat

Acoustic guidance ON/OFFAcoustic guidanceGuia acstica

Zoom out of the mapZoom outAllunyar

Zoom in the mapZoom in Apropar

Enter name of a Crossing StreetCrossing Crulla

Enter house numberHouse-number Nmero de carrer

Select pre-stored destinationDestination listLlista de destinacions

Enter HotelHotel Hotel

Enter RestaurantRestaurant Restaurant

Toggle to map-modeMap Mapa

Enter hospitalHospital Hospital

Enter Railway stationRailway stationEstaci de tren

Activate display controlDisplay Mostrar en pantalla

Change a destination itemChange Canviar

Return to starting pointGo backRetrocedir

Calculate routeStart route guidanceComenar ruta

Enter Border-pointBorder Frontera

Enter Highway exitHighway exit / Motorway exitSortida d'autopista

Select last destinationLast destinationdestinaci prvia

Show distance to destinationDistanceDistncia

Show calculated route listRoute listLlista de rutes

Enter FairFair / Trade showFira/Exposici comercial

Enter FerryFerryTransbordador

Enter Highway crossingHighway crossing / Motorway junctionIntersecci d'autopistes

Show destination mapDestination mapMapa de destinaci

Show POI (point of interest) in the mapShow + POIMostrar

Hide POI in the mapHide + POIOcultar

Show actual positions mapPosition mapMapa de posici

Enter Car rental stationCar rentalLloguer de vehicles

Enter other destinationsOther destinationsAltres destinacions

Enter Highway service stationHighway service stationEstaci de servei d'autopista

Calculate alternative routeAlternative routeRuta alternativa

Toggle to pictogram modePictogram Pictograma

Specific car accessories functionsEnglish exampleCatalan word

Enter air-conditioning menuAir-conditioning Aire condicionat

Give the timeTime Hora

Windows upUp Pujar

Windows downDown Baixar

Recall driver settingsSettings / RecallConfiguraci/Memoritzar

Give the dateDate Data

Enter car-checking menuControl / DiagnosticDiagnstic/Control

Set cabin temperatureTemperature Temperatura

Enter windows menuWindows Finestres

Enter ACC (adaptive cruise control) menuACC ACC

Defrost DefrostDispositiu antigla

Air re-circulation Re-circulationRecirculaci

Give climate informationWeather Temps

Enter seat menuSeat Seient

Choose the air-conditioning levelLevel Nivell

Choose air flow/Fan/Ventilation/Blower levelVentilationVentilaci

Set desired speedSpeed Velocitat

Generic wordsEnglish exampleCatalan word

SetSetConfigurar

OnOnEncendre

OffOffApagar

ResetResetReiniciar

OpenOpenObrir

CloseCloseTancar

LeftLeftEsquerra

RightRightDreta

FrontFrontDavant

RearRearDarrere

High levelHighSuperior

Low levelLow Inferior

.Table 7- Application wordsAs can be seen in the above tables, some descriptions were translated with more than one word (Telfon, mbil, GSM) , and some words were used to describe different concepts ( agenda describes Address book and Phonebook).

The specifications allows 39 different words for the Mobile phone application words, 65 different words for the IVR function keywords, and 96 different entries for the Car product keywords. In addition, the item language-dependent application words (corpus codes P1-2) allows for 10 additional words, giving a total of 210 different words.Words were grouped as follows Table 8Mobile Phone Application WordsIVR Funcions KewywordsCar Products Keywords

MbilCancellarAire_condicionat

MarcarSHora

RemarcarNoPujar

TrucarAjudaBaixar

AgendaAnullarConfiguraci

Acceptar_trucadaRepetirData

Nmero_de_telfonAcceptarDiagnstic

CasaAnteriorTemperatura

PenjarSegentFinestres

OficinaEsborrarACC

Rebutjar_trucadaEnviarDispositiu_antigla

Remarcaci_automticaOpciRecirculaci

SecretriaSortirTemps

EmergnciaMen_principalSeient

Remetre_trucadaCorregirNivell

OperadoraContinuarVentilaci

PrivadaSeleccionarVelocitat

InformaciAfegirRdio

PrefixGravarCasset

EmpresaMen_superiorCarregador_de_CD

OpcionsFinal_de_la_llistaParar

AccsPrincipiCD_nmero

SeleccionarReproduirPista

CodiCatalPausa

HorariCanviarPista_segent

Missatge_d'esperaLletrejarPista_anterior

ltimes_trucadesPgina_segentBuscar_emissores

Trucades_perdudesLlistaPosar

TonsDirectoriVolum

Mans_lliuresPgina_anteriorGirar

SilencisConnectarExplorar_CD

Trucades_rebudesActivarSelecci_aleatria

DuradaRestablirMissatges_de_trnsit

BuscarMenAvanar

PerfilProgramarRetrocedir

ConfernciaMen_inferiorFM

En_esperaDesactivarExpulsar

TelfonAutomticGreus

Nmeros_anteriorsManualAguts

IdiomaD.A.B.

Bstia_de_veuConfigurar

Correu_electrnicEncendre

LlegirApagar

RespondreReiniciar

InternetObrir

Obtenir_correuTancar

ContrasenyaEsquerra

SMSDreta

FaxDavant

ReenviarDarrere

DictarSuperior

Anar_aInferior

EncapalamentNavegaci

UrgentIntroduir_destinaci

Servei_telemticGuia_a

Llista_de_serveisCiutat

SortidesCarrer

AparcamentsAeroport

InternacionalCentre

Targeta_de_crditBenzinera

CrearTaller_mecnic

FunciGuiat

TransferirGuia_acstica

RevisarAllunyar

MemoritzarApropar

Crulla

Nmero_de_carrer

Llista_de_destinacions

Hotel

Restaurant

Mapa

Hospital

Estaci_de_tren

Mostrar_en_pantalla

Enrere

Comenar_ruta

Frontera

Sortida_d'autopista

destinaci_prvia

Distncia

Llista_de_rutes

Fira

Transbordador

Intersecci_d'autopistes

Mapa_de_destinaci

Mostrar

Ocultar

Mapa_de_posici

Lloguer_de_vehicles

Altres_destinacions

Estaci_de_servei_d'autopista

Ruta_alternativa

Pictograma

Exposici_comercial

ltim_missatge

CD

Table 8- List of Application Words3.1.2 Language-dependent application words P1-2

A list of 10 language dependent application words were chosen to complete the set of words described in Table 7 and not included in Table 8. Each speaker pronounces two words with corpus code P1-2. The set of 10 words is the following:Secretari

SOS

Fi

GSM

Inici

Nou

Control

Canviar_de_cara

Remarcaci

Poble

Table 9- Language dependent application words3.2 Voice activation keywords A1-2

Some additional command phrases are necessary to activate the recognition system. A short item may be undetected if spoken in noisy conditions or inside a whole phrase. To ensure that the system will detect the words, we need a short sentence instead of a command word. These words are in CC A1-2

The keywords to be used for voice activation are the following ones with their corresponding Catalan selected word :True Hands-free Telephony FunctionCatalan word

Making a phone call (by name or number)Trucar per telfon

Terminating a phone callAcabar la trucada

Dialing by numberSeleccionar un nmero

Dialing by nameSeleccionar una persona

Answering the incoming callContestar la trucada

Table 10- Voice activation keywords

3.3 Isolated digits

3.3.1 Single digits I1-4

Four isolated digit are elicited. Digits are READ. Corpus code is I1-I43.3.2 Digit string B1

Each speaker pronounces a different digit string. The string contains the 10 digits randomly ordered. The speaker reads the following instructions: Please, read these digits carefully with pauses among them. Star and hash are not included in the digit string. The corpus code is B1.

3.4 Connected digits

3.4.1 Sheet number C1

Six digits are used to numerate the sheets. The number is composed by 4 digits that indicate the session number and 2 checksum digits for a Hamming code. Detects and corrects one error.

Format: d1d2d3d4c1c2 From 0000xx to 2991xx

d1d2d3 Numbers from 000 to 299

d4 {0|1}

c1= (d1+d2+d3+d4)mod 10

c2= (d1+ 2d2+ 3d3+ 4d4)mod 10

The corpus code is C1

3.4.2 Telephone number C2, C5-C7

3.4.2.1 Spontaneous telephone number

Corpus Code C2 contains a spontaneous telephone number. In order to get it, the speaker gives a telephone number known by her/him.

3.4.2.2 Read telephone numbers

This is a 9-11 digit READ telephone number. The number of digits, spacing and presentation reflects typical telephone numbers in Catalan for national numbers including area codes and GSM codes but not local or international numbers e.g.

Fixed telephone numbers:

93 401 64 50

934 016 440

0 934 016 440

00 934 016 440

Mobile phone numbers

600 56 40 98

600 564 098

Each speaker read 3 tel. numbers. The corpus codes are C5-C7

3.4.3 Credit card number C3

Corpus code C3 contains a READ 16 digit credit card number. The list of the credit car numbers follows0500 8824 0710 27770490 2332 8674 12480510 0620 7667 3063

0480 8636 0800 98820520 2592 0833 44440470 1191 0364 9173

0530 9627 0888 92580460 6894 1601 20550540 2622 7895 8262

0455 3633 2402 89940555 0744 1721 01540440 1681 6295 4013

0560 0773 8387 02500430 9686 2602 62540570 3367 1735 1512

1501 0330 6973 40961491 1765 7405 20821511 2712 1274 1075

1481 6604 8812 70251521 3892 1201 58171471 0873 9933 1177

1531 1611 2666 75891461 2722 0290 71711541 8314 6395 9249

1456 9912 9725 20631556 8287 3613 96501441 2682 2366 3160

1561 2672 3583 97961431 1371 7635 17541571 9779 4095 9865

2502 0890 0300 68812492 2632 6096 22662512 7386 4186 5995

2482 1671 9896 19972522 6624 2093 11662472 0600 0864 3990

2532 3623 3185 31402462 1183 1811 98532542 8397 6634 5138

2457 3423 0670 00892557 1781 1365 70732442 8369 8616 5198

2562 1665 7368 22622432 0644 8695 86972572 0165 1591 0008

3503 3393 9315 67633493 1581 8668 93563513 3373 0400 6984

3483 1331 0764 70823523 3334 8802 89833473 9607 8626 0064

3533 0610 8714 67443463 2612 6195 02833543 6385 7875 5345

3458 0264 0590 72683558 8324 9325 79983443 1135 8304 2171

3563 7673 0320 95603433 9963 0234 10293573 3383 0220 7985

4504 0390 1291 78534494 9796 0420 00554514 1311 2322 9472

4484 7196 1211 90264524 0380 0110 98334474 2312 1301 6991

4534 2372 0664 46304464 8876 0680 98704544 0280 7396 7065

4459 7097 0823 01764394 0410 9705 21534384 1791 8685 1713

4294 2292 9923 20434284 1691 7296 90594194 9298 6775 0026

5505 8098 0844 10525495 8834 2184 64605515 1391 6594 8152

5485 2302 2692 82205525 9835 7188 41565475 6874 7286 9048

5535 1381 7615 90785465 8224 8297 91885545 0273 1421 8169

5683 8795 6614 00755693 3293 2392 76775783 2275 7776 0144

5793 1701 8278 94545889 1281 0370 32645893 9803 8865 3174

6506 9675 8376 95556496 8724 0754 78846516 7335 1235 2121

6486 6404 0034 10806526 1631 1265 80576476 8606 3276 2992

6536 8845 7277 82616466 0700 0310 81466559 7694 0790 8062

6564 0720 9225 90626434 1035 7303 35576574 9398 1755 7160

6424 8785 1645 10486584 7723 7974 91586414 8202 3193 8083

7507 9944 9407 53587497 3792 1801 93407517 3782 0780 8040

7487 7313 2382 20797527 9305 1745 78077477 1125 7605 2163

7537 0100 1321 86607467 6794 0134 99827565 2582 9696 2141

7435 7375 1401 91477575 8197 9198 91747425 0654 0190 0241

7585 9388 0900 90867415 7323 9617 90307595 7713 8704 9267

8508 3403 8855 20028498 9203 8189 98498518 9786 0182 6089

8488 6285 1411 30778528 7703 7684 36808478 2412 9825 4083

8538 2422 9669 38908468 6187 9288 30818566 0210 7625 9584

8436 2192 8778 30818576 3593 9099 91648426 1774 0124 3469

8586 1621 0580 08368416 1114 0854 93158596 9377 3413 7579

9509 2891 9715 59859499 3283 9213 93609519 9115 0810 0016

9489 9101 9813 49439529 0630 0091 00439479 1092 6374 8172

9539 3094 8212 91339469 0200 2702 12639567 9279 9901 9649

9437 3603 0690 60689577 1711 2223 73769427 9954 0734 7654

9587 2282 8406 10659417 6784 1221 80769597 8975 9976 9998

Table 11 Credit Car numbers

3.4.4 PIN code C4

The PIN code is a connected digit string of length 6, similar to the sheet number, but drawn from a set of 150 such numbers. The corpus code is C4. Table shows the list of PIN numbers000100002003004005006007011012013014

015016019021022023024025026027030310

032033034035036037041042043044045046

049050051052053054057058059061062063

066067068069070710072073076077081082

085086087088089090091092095096097098

099111112102115116117118119120122123

127128129132133134137138139140141420

143144147148149152153154155156159160

161621163164167168169172173174175176

181820183184185186193194195196199201

222300224211226227229214236237248249

255256257217265266272730283281284285

293290296297298299333411336302337322

338339347348354352355356364365366367

374370378379384381389390395393398399

444540447415449404456422457441466443

467460469463475472485480486481488489

495492496497498499555601557500559523

565611566502567524568562569563576536

577578579580586581587583596592666700

668605669612676613677614678615679662

687673688633689681697682698690699722

777811779733787702788723789745798785

799801888922898804999800958858907807

Table 12 PIN Numbers3.5 Dates D1-3

3.5.1 Spontaneous date

Corpus code D1 contains a spontaneous date. Is the birth date of the speaker.

3.5.2 Prompted date

Corpus Code D2 contains prompted dates. The prompted date expression is Week-day, dd de Month de yyyy and is presented in the prompt sheet in the conventional Catalan way.

Example: Divendres, 25 de Gener de 1997

Each speaker pronounces a different date.

Dates include:

Years from 1960 to 2037

Days from 1 to 31 and words contained in the following list

abrilAprilagostAugustdesembreDecember

febrerFebruarygenerJanuaryjuliolJuly

junyJunemaigMaymarMarch

novembreNovemberoctubreOctobersetembreSetember

dissabteSaturdaydiumengeSundaydivendresFriday

dijousThursdaydimecresWednesdaydimartsTuesday

dillunsMondayianddof

deof

Table 13- Words in prompted dates3.5.3 Relative and general date expression

Relative and general date expressions, D3 are typically spoken in real applications. The list of sentences is:

abans d'ahirBefore yesterday

ahirYesterday

avuiToday

demTomorrow

dem passatThe day after tomorrow

la setmana vinentNext week

el mes que veNext month

el mes passatLast month

la setmana passadaLast week

el proper cap de setmanaNext weekend

a mitjans de la setmana passadaMid last week

la propera setmanaNext week

Table 14- Sentences in relative dates3.6 Embedded application word phrases E1-2

E1-E2 is the CC of a set of phrases that contain embedded application words to provide a basis for word-spotting tests, and also as a source of data which more accurately reflects spontaneous production of application words.

Examples

Seleccionar el CD nmero set

Trucar a una persona

3.7 Spelled names/words L1-7

Spelling is not common in Catalan and people are not used to spell words. Text to spell were shown to the speaker in capital letters without accent or other symbol.

The following table shows the letter symbol from a Catalan dictionary, their usual name, the alternative name, if any, the expected counts and the counts at transcription level. Note that Counts at transcription level are higher because spontaneous spellings were not taken into account in the expected counts.LetterNameAlternative NameExpected counts Counts

Aa21852704

Bbebe alta683770

Cce9691091

ce trencada403403

Dde911985

Ee20892456

Fefa488507

Gge727851

Hhac404422

Iii llatina19502178

Jjota414437

Kca378378

Lelaele15641826

Memaeme859979

Nenaene14261593

enya339341

Oo17452011

Ppe762863

Qcu395406

Rerraerre13981754

Sessaese16431846

Tte11861306

Uu13451464

Vve baixauve561625

Wve doble366366

Xics406411

Yi grega340353

Zzeta316427

Table 15-Letter symbol, names, expected counts and counts at transcription level. Although is not a Catalan letter, it is included because some of the common Catalan surnames come from Spanish, and on some of them is contained.

3.7.1 Spontaneous name

L1 contains the spelling of a spontaneous name. The name of a friend was asked to the speaker in CC O1 and later on spontaneously spelled.

3.7.2 Prompted name linked to city

L2 contains the spelling of the city pronounced in O3

3.7.3 Real names/words

Corpus codes L3, 4, 5, 6 contain spellings of words. These words have a big variability to achieve more different letters.

3.7.4 Artificial name

L7 is a spelling composed by letters poorly represented in the above mentioned spellings. Its used to compensate the number of realizations of the recorded letters

3.8 Money amount M1

M1 contains money amount. Euro (Euro) and cents (cntims) are included. Formats are:

sis-cents setanta-un euros i dotze cntimsset-cents tretze mil setanta-quatre euros3.9 Natural number N1

N1 contains a read natural number between 100000 and one million

Format is:

set-cents setanta-sis mil quatre-cents dos3.10 Directory assistance names O1-7

3.10.1 Spontaneous forename

O1: Forename of a friend

3.10.2 Spontaneous city name

O2: City of growing up

3.10.3 City name (set of 150)

O3-4 Include Catalan cities and other European cities and countries

AlemanyaustriaBlgicaDinamarcaEspanya

FinlndiaFranaGrciaIrlandaItlia

LuxemburgNoruegaPasos BaixosPortugalRegne Unit

RssiaSuciaSussaParsLondres

CopenhaguenMadridHlsinkiEdimburgSevilla

BerlnRomaAtenesBrussellesMil

MunicRotterdamLisboaVienaEstocolm

GinebraDublnMoscouOsloLi

GlasgowEspooMarsellaOdenseTampere

HamburgTessalnicaPatresNpolsBruges

ArhuslabaAlacantAlbaceteAlmeria

AstriesvilaBadajozBarcelonaBiscaia

BurgosCceresCadisCantbriaCastell de la Plana

Ciudad RealConcaCrdovaLa CorunyaGirona

GranadaGuadalajaraGuipscoaHuelvaIlles Balears

JanLleidaLleLugoMlaga

MrciaNavarraOscaOurensePalncia

Las PalmasPontevedraLa RiojaSalamancaSanta Cruz de Tenerife

SaragossaSegviaSriaTarragonaTerol

ToledoValnciaValladolidZamoraHospitalet de Llobregat

BadalonaSabadellTerrassaSanta Coloma de GramenetMatar

ReusCornell de LlobregatSant Boi de LlobregatManresaEl Prat de Llobregat

RubViladecansGranollersCerdanyola del VallsVilanova i la Geltr

Sant Cugat del VallsEsplugues de LlobregatMollet del VallsCastelldefelsGav

Sant Feliu de LlobregatSant Adri de BessFigueresIgualadaVic

TortosaRipolletVilafranca del PenedsBlanesOlot

Montcada i ReixacSant Joan DespBarber del VallsPremi de MarEl Masnou

VallsEl VendrellMolins de ReiSant Pere de RibesSant Andreu de la Barca

Santa Perptua de MogodaPineda de MarMartorellSant Feliu de GuxolsCambrils

PalafrugellManlleuSitgesLloret de MarAmposta

Table 16- City names

3.10.4 Company/agency name/street name (set of 150)

O5-6. Include a list of brands and company namesABCAbertis

AcesaACS

AdidasAENA

Agncia EFEAiges de Barcelona

AirEuropaAlcampo

AlcatelAldi

Al-PiAltadis

AmadeusAmena

Antena 3 TVApple

AunacableAvis

AvuiBanc Sabadell - Atlntic

BanestoBankinter

BankpymeBarclays Bank

BayerBBVA

CritasCadena 100

Cadena SERCaixa de Catalunya

Caixa de GironaCaixa de Manresa

Caixa LaietanaCaixa Peneds

Caixa PopularCaixa Sabadell

Caixa TarragonaCaixa Terrassa

Caja MadridCanal +

CanonCarrefour

CepsaChupa-Chups

CinesaCitron

Coca-ColaCompaq

COPECorreus

Creu RojaDanone

Deutsche BankDiari de Barcelona

DuracellEDreams

El Corte InglsElectronic Arts

El MundoEl Mundo Deportivo

El PasEl Peridico

El PuntEndesa

EpsonEricsson

Europa PressFecsa-Endesa

FiatFibanc

FnacFord

Gallina BlancaGas Natural

Grup BalaEroski

FerrovialInditex

NutrexpaPuig

RocaUnilever

PRISAHalcon Viajes

HondaHP

IbriaIberdrola

IberojetIBM

IkeaIndra Sistemes

ING DirectIntel

JazztelKodak

La CaixaLa Razn

Lauren FilmsLa Vanguardia

MangoMenta

Metro BarcelonaMicrosoft

MotorolaMoviStar

Nez y NavarroNescaf

NestlNH Hotels

NissanNokia

OnceONO

OpelPepsico

PetrocatPeugeot

PhilipsRdio Barcelona

Radio Club 25Renault

RenfeRepsol YPF

RetevisinSamsung

San MiguelSanyo

SeatSiemens

Sol MeliSony

SpanairSport

TeleCincoTelefnica

TelepizzaTerra

ToshibaTrasmediterrnea

Uni2Uno-e

Viatges MarsansVilaWeb

VodafoneWanadoo

WolkswagenZara

Table 17- Company names

3.10.5 Forename & surname (set of 150)

Each speaker pronounces a forename + surname in the CC O7. The complete list is composed by 150 items and is shown bellow

Aida VerdenyAntoni GasaEloi Benaiges

Joan Josep FeixasFtima SacauJana Barcons

Martina PuigdemasaMaria Merce VivetPau Gallart

Jaume SarrocaYasmina SolanaMar Burgues

Francesc Josep PortaGemma MirabetIvet Escales

Sandra PuenteGeorgina BautistaEdgar Borrull

Hug SuriolMaria BernadetsBiel Padilla

Sofia JoveAlba NollaJudith Turmo

Josep Mara TohaManel CastellvMiguel Jimnez

Susana SalgadoBeatriu Foresngel Sol

Daniela PortellaCarlota PedarrosOriol Goma

Abril PetitGabriel BetriuLluc Arnalot

Tania CastellCrstian ArtigasSergi Tubau

Samuel BertranManela FalcoIvan Gangolells

Gloria MolinJlia EspasaXavier Mercad

Guillem AltarribaAitana BelletAlicia Barrera

Lara MasesDaniel ViloAina Rota

Pol LladNora RibaDiego Salvans

Adam EspaolNdia LloveraIns Monne

scar AmpostaHctor ReLorena Moix

Abel AbellMariano Perpiric Segarra

Anna SegusAlexandra BruguesMnica Descarrega

Neus MasdeuRubn RoyoAnna Maria Sabata

Mriam LorenzoMar FolqueCristina Cabau

Helena CugatToni VilaltaDavid Auguets

Sebasti CasellesRuth AlegretNolia Torra

Ona SamperCarolina FierroAdri Vilarrasa

Sonia Guimerangela BlasiAlxia Altes

Domingo FitAdriana EstanyJuli Puigdevall

Aitor SirventSara CarlesMiquel Arro

Estebe RavetllatEmlio ToldrMaria Del Carme Orriols

Joan Antoni FustMireia EsquerreElena Vilagines

Blanca PegueraMarina SentisAinara Badosa

Albert PeruchoArnau LlopartIsmael Marqus

Carles TrillaMariona CastellarnauXnia Izquierdo

Lidia RosellGisela FrancoPere Anguera

Laia PuigvertRicard BolduPatricia Espot

Josep Antoni MargalefBerta PerisMaria Del Pilar Virgili

Josep Lluis MitjanaIngrid RuanaJan Verdaguer

Meritxell CerquedaAinhoa PepiolRamn Benavent

Elsa SanglasVicen AlaaIsaac Moreso

Josep Manel RicartSantiago FarrMax Prats

Nayara ViasToms SaforcadaBernat Espuny

Mara Jos MuntCludia RafolsAriadna Morros

Noa GalitoAaron NogueraGens Batalla

Ferran AltadillEsther PouIris Raich

Ddac PubillNoem BoixVctor Mota

Raquel SilvaClia CunillEmma Panisello

Rafel RodaNatlia CasesMario Trepat

Ignasi GodiaAleix PeretRosario Boixader

Table 18- Forename & surname

3.11 Phonetically rich sentences S1-9

Each speaker pronounces nine sentences CC S1-9 from a set of phonetically rich sentences. Every set of nine sentences was designed to contain each phone at least once. Next Table shows phone counts at prompt and transcription level AllophoneCounts (promt)Counts

(transc)

@2424924099

a96899695

B21832170

b35563241

D16771679

d49444920

e51765208

E37433708

f20091997

G10451043

g12741253

i1143011334

j637640

J637635

k73197270

L11811170

l90499013

m63886360

N658655

n1020810151

O32193184

o34323535

p53355456

r63506322

rr37383734

S1115781

s1308613023

t94579053

u96749538

w20502021

Z15141485

z14961464

Table 19- Phone frequencies of the phonetically rich sentences at prompt and transcription level of the close-talk microphone

3.12 Times T1-2

3.12.1 Spontaneous time

Each speaker says the current time CC T1

3.12.2 Read time phrase

CC T2 contains a read time phrase

Format examples:

Hora1: {l|les} {1-12} {i|menys} {deu,quart,20,25,mitja} {del mat, de la tarda, del migdia, del vespre, de la mitjanit, de la nit}

Hora2:{un quart|dos quarts|tres quarts}{i mig}{d|de}{1-12}{del mat, de la tarda, del migdia, del vespre, de la mitjanit, de la nit}Hora 3: {l|les} {1-12} en punt {del mat, de la tarda, del migdia, del vespre, de la mitjanit, de la nit}

{es|son}{aprox|exac|gaireb} {hora1|hora2|hora3} {avui, dem, ahir} a {hora1|hora2|hora3} {aprox|exac|gaireb} a {hora1|hora2|hora3}{dahir, davui, de dem} {2-12} hores i {2-59} minuts

a l'una i vint-i-cinc de la nit d'ahir exactament a les tres menys cinc de la matinada de demCatalan wordEnglish wordcounts T2counts TL

lthe4941

matmorning6770

tardaafternoon6981

menysto70123

migdianoon7070

mitjanitmidnight7069

nitnight7070

enin7379

puntoclock7379

mitjahalf73107

gairebalmost7575

vespreevening7676

matinadadawn7878

snits86214

aproximadamentapproximately8786

exactamentexactly8989

horeshours100101

minutsminutes100162

quartquarter101146

quartsquarters118213

avuitoday119119

demtomorrow121121

ahiryesterday148148

mighalf149153

d'of198223

delof213219

lesthe270670

lathe287319

aat388388

iand400707

deof552664

Table 20- Words in Time phrases. Catalan word, English word, counts at corpus level of the T2 set and counts at transcription level of T1 and T2. Digits and numbers are not included in the table.3.13 Phonetically rich words W1-4

Each speaker utters four words from a set of 2400 phonetically rich words CC W1-4. The following table shows their phone frequencies at prompt and transcription level of the close-talk microphone

AllophoneCounts

(prompt)Counts

(transc)

@21552117

a831834

B301293

b305228

D300299

d335332

e323319

E624606

f251246

G300297

g303297

i13311314

j298294

J300296

k743734

L307299

l633621

m605592

N301297

n835814

O265259

o300300

p568594

r919910

rr445435

S300284

s11341121

t798779

u16131591

w310302

Z299292

z302297

Table 21- phone frequencies at prompt and transcription level of the close-talk microphone

3.14 Spontaneous sentences Z0-9

10 spontaneous sentences with corpus codes Z0-9 were recorded in 200 sessions

For each one of the 10 sentences to be recorded, the operator asks to the speaker to make a sentence. The operator gives a topic to the speaker. Topics are described below. Items in () represent the type of data wanted, items in [] separated by | are alternatives, items in {} are optional, items in are parameters that have to be set.

Following each topic, it is shown the content of the LBR: label in the label files.

A: Teleservices

01) Voice-mail message to a friend (phone number, reason of call, etc.)

02) Phone number information service (name, town and address)

03) Interaction with an operator (calling telephone exchange)

04) Name retrieval by phone number

05) Call a travel agency and book a [train ticket | flight] (destination, date, time, type of train)

06) Call your bank [for account information | to transfer money to an account]

07) Call the theatre and ask for the seats available for a performance

08) Call a [book store | music shop] to enquire about the availability, price {and edition} of a [book | CD]

09) Call your own answering machine to check the messages

10) Dictate a short business letter via the mobile phone.

11) Tell the hotel that you will arrive very late but that you definetely want to take the reserved room

12) Ask the airport information for the latest flights to

13) Tell your speech-savvy mobile phone [to read [a fax | an email] | that you want to check your mailbox | to schedule an appointment with | organize a conference call with [ | your office] and a client]

B: Navigation

14) Describe the [current | a recent] traffic situation

15) Give [the police | rescue] a description how to get to your current location

16) Give the navigation system the coordinates of your favorite restaurant (name, address, city)