12323133131313131.design_cotxe_v1.4
DESCRIPTION
code about CATALAN DATABASE FOR IN-CAR APPLICATIONSTRANSCRIPT
Design.doc
Catalan Database for In-Car Applications
CATALAN DATABASE FOR IN-CAR APPLICATIONS
Author(s):Asuncin Moreno, David Conejero, Gonzalo Bustamante
Institute:Universidad Politcnica de Catalua
Address:Jordi Girona 1-3, Edificio D5, 08034 Barcelona, Spain
email:[email protected]
Date:December, 23rd 2006
Version:V1.4
CONTENTS
41.Introduction
1.1Speech file formats51.2Directory structure61.3File nomenclature71.4Label files92.Database design and collection122.1Recording platform122.2Speaker recruitment132.3Design of prompting and prompt-sheet133.Database contents definition133.1Application words133.1.1Common application words 00-81133.1.2Language-dependent application words P1-2203.2Voice activation keywords A1-2213.3Isolated digits213.3.1Single digits I1-4213.3.2Digit string B1213.4Connected digits213.4.1Sheet number C1213.4.2Telephone number C2, C5-C7223.4.3Credit card number C3223.4.4PIN code C4243.5Dates D1-3253.5.1Spontaneous date253.5.2Prompted date253.5.3Relative and general date expression253.6Embedded application word phrases E1-2263.7Spelled names/words L1-7263.7.1Spontaneous name273.7.2Prompted name linked to city273.7.3Real names/words273.7.4Artificial name273.8Money amount M1273.9Natural number N1273.10Directory assistance names O1-7273.10.1Spontaneous forename273.10.2Spontaneous city name283.10.3City name (set of 150)283.10.4Company/agency name/street name (set of 150)283.10.5Forename & surname (set of 150)303.11Phonetically rich sentences S1-9313.12Times T1-2323.12.1Spontaneous time323.12.2Read time phrase323.13Phonetically rich words W1-4333.14Spontaneous sentences Z0-9343.15Any other additional material363.16Links to other databases364.Transcription365.The lexicon386.Speaker demographic information406.1Accent/Regions406.2Speaker characteristics417.Recording conditions418.Deviations from SpeechDat Car specifications429.Sample Prompt sheets429.1sample instruction sheets and prompt sheet4210.BIBliography44
1. Introduction
The Catalan Database for In-Car Applications was recorded within the scope of the Generaci de recursos lingstics per les technologies de la parla project which was sponsored by the Catalan and Spanish Governments.Collection was performed at the Department of Signal Theory and Communications of the Universitat Politcnica de Catalunya (UPC) (Spain) and annotation was performed at Verbio Technologies. The owner of the database is the Catalan Government.This database comprises in-car recordings from 300 speakers recorded in 600 different sessions. The database follows the SpeechDat Car specifications (corpus content, speakers, transcription, lexicon, formats) and the Speecon specifications for the recording platform (speech signal formats and doc files). The database is distributed in 12 ISO 9660 DVD volumes and one CD ROM. The CD is used for text files and documentation, DVDs content recordings in the car. The content of each volume is described below. Tables show the disk identification name, the first and last codes of the sessions included in each disk and the effective number of sessions.
DiskDISK_IDFromToSesContents
CD01VEHIC2CAD00Text and
documentation
DVD00VEHIC2CA000BLOCK00/SES0000BLOCK00/SES004950Signals
DVD01VEHIC2CA001BLOCK00/SES0050BLOCK00/SES009950Signals
DVD02VEHIC2CA002BLOCK01/SES0100BLOCK01/SES014950Signals
DVD03VEHIC2CA003BLOCK01/SES0150BLOCK01/SES019950Signals
DVD04VEHIC2CA004BLOCK02/SES0200BLOCK02/SES024950Signals
DVD05VEHIC2CA005BLOCK02/SES0250BLOCK02/SES029950Signals
DVD06VEHIC2CA006BLOCK03/SES0300BLOCK03/SES034950Signals
DVD07VEHIC2CA007BLOCK03/SES0350BLOCK03/SES039950Signals
DVD08VEHIC2CA008BLOCK04/SES0400BLOCK04/SES044950Signals
DVD09VEHIC2CA009BLOCK04/SES0450BLOCK04/SES049950Signals
DVD10VEHIC2CA010BLOCK05/SES0500BLOCK05/SES054950Signals
DVD11VEHIC2CA011BLOCK05/SES0550BLOCK05/SES059950Signals
The list of the distribution disks and directories are contained in the README.TXT file. Further details regarding the database contents, files and directories are provided in the documentation files in the DOC directory and the files in the TABLE and INDEX directories.
File types are identified with the following extensions:
*.DOC
- Microsoft Word V6.0 document
*.LST
- DOS text index file with ISO Latin 1 symbols
*.TBL
- DOS text file with ISO Latin 1 symbols
*.SES
- DOS text file
*.TXT
- DOS text file
*.CAC
- SAM label file, text file with ISO Latin 1 symbols for car recordings
*.CA1
- Speech signal channel 1
*.CA2
- Speech signal channel 2
*.CA3
- Speech signal channel 3
*.CA4
- Speech signal channel 4*.PS
- Postcript file
Each CD-ROM has the following directory structure:
\:
COPYRIGH.TXT
- copyright notice
DISK.ID
- UNIX volume ID file
README.TXT
- readme file
VEHIC2CA\
- data directory
VEHIC2CA\DOC:
DESIGN.DOC- Catalan database documentation file
SUMMAR0.TXT- database contents summary file
SAMPALEX.PS- SAMPA table
ISO88591.PS
- ISO 8859_1 table
VEHIC2CA\INDEX:
CONTENT0.LST- file/utterance/speaker index table
VEHIC2CA\TABLE:
LEXICON.TBL- full lexicon table
REC_COND.TBL- Recording condition table
SESSION.TBL- session table
SPEAKER.TBL- speaker table
VEHIC2CA\:
- contains the data block directories
BLOCK00\- sessions are grouped in blocks
BLOCK01\
...
VEHIC2CA\BLOCK00:
SES0002\ - session directories for each session
...
VEHIC2CA\BLOCK00\SES0002:
V2000206.CAC- SAM label file for car recordings
V2000206.CA0- speech signal file in carV2000206.CA1- speech signal file in carV2000206.CA2- speech signal file in carV2000206.CA3- speech signal file in car
1.1 Speech file formats
Four high quality audio channels are recorded in a car in a mobile platform Plt_M and are stored as sequences of 16bit, 16 kHz uncompressed.Each prompted utterance is stored within a separate file. Each speech file has an accompanying ASCII SAM label file
1.2 Directory structure
The directory structure uses a shallow directory nesting with contiguous numbers to identify the individual sub-directories and call directories. The following threelevels directory structure is defined:
\\\
Where:
Defined as: i.e. VEHIC2CAWhere:
is VEHIC
is 2 for this project is the ISO 2letters code CA for Catalan
Defined as: BLOCK
where is a progressive number from 00 to max. 99. These numbers are the same as the first 2 digits used in described below.
Defined as: SES
Where is a progressive number in the range 0000 to max. 9999, being the numeric call identification number also encoded in each filename.
Table 1- SpeechDat Car directory structure
Both signal files and label files are put in the same directory.
In addition to the previous structure the following directories are used to store some other files:
\\DOCdocumentation files
\\TABLEspeaker, recording conditions, session and lexicon tables
\\INDEXindex files
Table 2- Non-speech related directory structure
All sessions have complete recordings for all prompted items. Exceptions can be found in the summary text files.
Finally the root directory contains three files:
a README.TXT ASCII file describing all files in the database, per disk; signal and label files are reported by specifying their templates;
a DISK.ID ASCII file containing the volume name (11 characters long); it supplies the volume label to UNIX systems that are unable to read the physical volume label, e.g. VEHIC2CA001
a COPYRIGH.TXT ASCII file to protect the authors rights.
All these support files, except README.TXT file are duplicated in each database disk.
1.3 File nomenclature
File names follow the ISO 9660 file name conventions (8plus3 characters) according to the main CDROM standard. The following template is used:
DD NNNN CC. LL F
where:
DDDatabase identification code (00-ZZ)
For this project: V2
NNNNRecording session progressive number (0000-9999); see section 1.2
CCCorpus code (00-81, A1-Z9) obtained by collating the corpus and the item identifiers
LLTwo letter ISO 639 language code
FFile type code
n=Channel n in-vehicle recording, n = 0,1, 2, 3C=label file for car recording
Table 3- SpeechDat Car filename convention
As it is useful for users to clearly identify the speech file contents by looking at the filename we have specified two-character corpus by the following table. All items are read, unless marked as spontaneous.
Corpus identifierItem identifierCorpus contents
A1,22 voice activation keywords
B11 sequence of 10 isolated digits
C11 sheet number (4+ digits)
C21 spontaneous telephone number (9-11 digits)7 connected digits
C5,6,73 read telephone numbers
C31 credit card number (16 digits)
C41 PIN code (6 digits)
D11 spontaneous date, e.g. birthday
D21 prompted date, word style3 dates
D31 relative and general date exp.
E1-22 word spotting phrases using an application word (embedded)
I1-44 isolated digits
L11 spontaneous, e.g. own forename
L21 spelling of direct. city name7 spelled word
L3,4,5,64 real word/name(letter sequences)
L71 artificial name for coverage
M11 money amount
N11 natural number
O11 spontaneous, e.g. own forename
O21 city of birth / growing up (spontaneous)
O3,42 most frequent cities 7 directory assistance
O5,62 most frequent company/ agency/ street names
O71 forename/surname
S1-99 phonetically rich sentences
T11 time of day (spontaneous)2 time phrases
T21 time phrase (word style)
W1-44 phonetically rich words
00-1213 Mobile phone Application words
20-4122 IVR functions keywords67 application words
50-8132 car products keywords
P1,22 additional language dependent keywords
Z0-9Prompts for spontaneous speech
Table 4 Corpus codes and contents.
The proposed format uses mnemonic values. It permits selection of all files belonging to one of the sixteen corpora by using one command (e.g. in DOS dir /s/b ??????C*, in UNIX find. name??????C*print).
A list of separate documentation files, tables and listings follows below:
DirectoryFile
TABLELEXICON.TBLlexicon
REC_COND.TBLrecording condition table
SESSION.TBLsession table
SPEAKER.TBLspeaker table
INDEXCONTENT0.LSTcontents list (for close talk mic)
DOCDESIGN.DOC(this) main documentation file
ISO88591.PSISO 8859 character set
SAMPALEX.PSSAMPA phone symbols used in lexicon
VALREP.TXTValidation report
SUMMAR0.TXTsummary file (for close talk mic)
Table 5- Documentation files, tables and listing
1.4 Label files
Label files adhere to a modified SAM label format:
ABC: item1, item2, item3,
where
ABC is a three letter mnemonic followed by a colon; the mnemonic must contain only 7-bit US-ASCII character and may not contain spaces or colons
items after the mnemonic are separated by commas, i.e. they cannot contain commas themselves
items can be empty
spaces after the colon or in between items are recommended to improve readability
a label line is delimited by , the line end sequence according to the DOS operating system.
Note: the 80 character limit on line length is no longer enforced in SpeechDat-Car. Consequently, the version number in the LHD field has been updated.
Table 6 shows the SAM labels used in this database. Optional items are enclosed in {}.
SAM LabelDescriptionFormatFormat string
LHDlabel headerfixed vocabulary item%s
ELFend of label file
CMTcommentfree-form text%s
DBNdatabase nameSpeechDat_Car_%s
SESsession number4-digit number%04d
REGcalling regionfixed vocabulary item from list of regions%s
SCDspeaker coden-digit number%0d
SEXspeaker genderfixed vocabulary item: {M|F}%s
AGEspeaker ageinteger%d
ACCspeaker accentfixed vocabulary item from list of dialects%s
DIRspeech file directoryfixed vocabulary item from file system\BLOCK\SES\%s
SRCspeech file name8.3 file name%8c.3c
CCDcorpus code2 character code%2c
REPrecording placefree-form text%s
REDrecording dateDD/Mon/YYYY%02d/%3c/%4d
RETrecording timeHH:MM:SS%02d:%02d:%02d
BEGlabelled sequence begin positioninteger%d
ENDlabelled sequence end positioninteger: number of sample points in recording%d
SAMsampling frequencyinteger: {8000|16000}%d
SNBnumber of (8-bit) bytes per sampleinteger: {1|2}, {signed|unsigned}%1d,%s
SBFsample byte orderinteger: {0|lohi}%s
SSBnumber of significant bits per sampleinteger: {8|16}%d
QNTquantizationfixed vocabulary item, e.g.: {ALAW | RAW | PCM}%s
NCHnumber of channels4%d
LBDlabel file body
LBRprompt textBEG,END,,,,
with , , optional signal values; if they are not known, the values may be left empty, but the correct number of commas must remain. is the ISO 8859-1 encoded text that appears on the screen.%d, %d, %d, %d, %d, %s
CARcar make and typefree-form text%s
SPPspeaker positionfixed vocabulary item:
{DRIVER | CO_DRIVER}%s
EXNexperimenter namefree-form text%s
SCCscenario codefixed vocabulary item
{HIGHWAY | CITY | }%s
WTCweather conditionfixed vocabulary item
{SUN | RAIN | }%s
CEQcar equipmentattribute value pair list, e.g.
CLIMATE=ON, WINDOW_L_FRONT=OPEN,
only binary values are allowed, and all attributes must be present%s
MIPmicrophone positionattribute value pair list
CH0=CLOSE_TALK, CH1=CLOSE, CH2=CENTER, CH3=A_COLUMN
%s
MITmicrophone typeattribute value pair list
CH0=SHURE, CH1=LAVALIER, CH2=PEIKER, CH3=AKG%s
LBOorthographic transcriptionBEG, (END-BEG)/2, END,
with the appropriate SpeechDat-Car compliant ISO 8859 annotation text.%d, %d, %d, %s
LB{0|1|2|3}orthographic transcriptionBEG, (END-BEG)/2, END,
with the appropriate SpeechDat-Car compliant ISO 8859 annotation text.%d, %d, %d, %s
Table 6- SpeechDat-Car SAM labels
Example of a .cac label file
LHD: SAM,6.0
DBN: SpeechDat_Car_CASES: 2381
CMT: *** Speech Label Information ***
SRC: V22381A1.CA1DIR: \VEHIC2CA\BLOCK23\SES2381
CCD: A1
BEG: 0
END: 78399
SYN: 2724
REP: Barcelona
RED: 27/Mar/2006RET: 12:33:57
EXP: Sergio OllerCMT: *** Speech Data Coding ***
SAM: 16000
SNB: 2,unsigned
SBF: lohi
SSB: 16
QNT: RAW
NCH: 4
CMT: *** Speakers Information ***
SCD: 238
SEX: M
AGE: 45
ACC: EAST
CMT: *** Recording conditions ***
CEQ: CLIMCONTROL=ON,AUDIO=ON,WINDOW_L_FRONT=CLOSE,WINDOW_R_FRONT=CLOSE,WINDOW_REAR=CLOSE,ROOF=CLOSE,WIPERS=OFF,CROSS_TALK=NO
WTC: SUN
REG: EAST
CAR: SEAT Alhambra
MIP: CHN0=CLOSE_TALK,CHN1=CLOSE,CHN2=CENTER,CHN3=A_COLUMNMIT: CHN0=SHURE,CHN1=LAVALIER,CHN2=AKG,CHN3=PEIKERSPP: CO_DRIVER
EXN: Sergio OllerSCC: HIGH_SPEED_GOOD_ROAD
CMT: *** Label File Body ***
LBD:
LBR: 0,78399,,,,Finalizar la llamada
LB0: 0,39199,78399,[sta][int] finalizar la llamada
LB1: 0,39199,78399,
LB2: 0,39199,78399,
LB3: 0,39199,78399,
ELF:
2. Database design and collection
2.1 Recording platform
The recording platform is a mobile recording platform (PltM) installed inside the car, recording multi-channel speech utterances in a high bandwidth mode (60-7000 Hz, 16 kHz sample frequency).
Multi-channel recordings are performed in the car. The recordings are made through an Acoustic front-end (AFE) installed inside the car and connected to the recording platform PltM.. Three kinds of AFEs are used simultaneously during the recordings: a close-talk microphone, a Lavalier microphone and a remote noise cancelling microphone with 2 Handsfree microphones placed at different locations in the car
The mobile recording platform in the car (PltM) uses a PC to drive the recording process. Data acquisition is performed by a dedicated hardware in the PC and the storage is made directly on hard disk. The recordings are always made on four channels (1 close-talk signal as reference, one close signal and 2 far-talk signals). The positions for the far-talk microphones are:
A_Column: at the ceiling of the car near the A-pillar
Center: at the ceiling of the car over the mid-console (near the rear mirror)A flat panel TFT colour-display for in-vehicle use is attached to the windscreen or the dashboard of the car.The data acquisition board installed in the Car-PC is a combination of two plug-in boards:
Multifunction data acquisition board
Anti-aliasing filter board Multi-channel board recording API
User Interface (MMI)
Prompt file management
2.2 Speaker recruitment
Speakers were recruited from several Universities in Catalonia (students and their relatives) and from some associations. This method has access to a large quantity of people of several dialectal areas, sex and ages. In most of the cases, each speaker recorded two sessions consecutively.
2.3 Design of prompting and prompt-sheet
600 different prompt sheets were generated. Phonetically rich sentences are read at the beginning of the session. This was recommended by the specifications. Other items are spread over the prompt sheets to avoid list effect.
3. Database contents definition
Table 4 shows the contents of the Catalan Database. The final specification for the Catalan recordings is as follows:
3.1 Application words
3.1.1 Common application words 00-81
Corpus codes 00-81 contain application words. Each speaker pronounces 82 applications words from a set of 200. The 200 Catalan application words were translated from a set of 200 English descriptions provided by the SpeechDat car consortium. Translations were done by the UPC Servei de Llenges i Terminologa .
The following Tables show the English Description, English example and Catalan words as provided by the professional translators.Mobile Phone Application WordsGSM telephoneEnglish exampleCatalan word
Select telephone (and display menu) Telephone / Mobile / GSMMbil/Telfon/GSM
Dial number (select number dialing)Dial, number-dialingMarcar
Redial last number or last nameRedial Remarcar
Place a call with a name Call Trucar
Access to call names (phone numbers of regularly called people) Phonebook Agenda
Begin dialingDial Marcar
Accept incoming callAccept call Acceptar trucada
Enter telephone number (digit by digit)Phone number Nmero de telfon
Store number in memoryStoreMemoritzar
Home phoneHomeCasa
Hang up the telephone / End the callHang up /EndPenjar
Office callOfficeOficina
Refuse incoming callRefuse a callRebutjar trucada
Keep on redialing automatically (call back)Automatic redial / Call back Remarcaci automtica/Remarcaci
Secretary callSecretarySecretria/Secretari
Emergency callEmergency / MaydayEmergncia/SOS
Forward a callForward call Remetre trucada
Transfer to human operatorOperator Operadora
Private numbers listPrivate Privada
Select info functionsInfo Informaci
Enter prefix of the telephone number (digit by digit)Prefix Prefix
Professional number listBusinessEmpresa
Select setting functionsSettings Opcions
Select access functionsAccess Accs
Choose a name out of the last X callsChoose Seleccionar
Access to code for changingCode Codi
Date of deliveryScheduleHorari
Answering to the incoming call with a greetingGreeting / Waiting messageMissatge d'espera
Play or present dial history listPrevious numbers / Last callsltimes trucades/Nmeros anteriors
Getting a list of missed callsMissed callsTrucades perdudes
Use of DTMFDTMFTons
Switch to hands-free (muting headset)Hands-freeMans lliures
Mute modeMuteSilencis
Getting a list of received callsReceived callsTrucades rebudes
Getting the time length of the callTime Durada
Look for information on ACRONYM (SD trained name)Look upBuscar
Change/Send the current user profileProfile Perfil
Make a conference phone callConference Conferncia
Put on holdHoldEn espera
IVR Funcions KeywordsIVR functionsEnglish exampleCatalan word
Cancel the current operationCancel / UndoCancellar
Stop current functionStop Parar
Answer to prompt with yesYes S
Answer prompt with noNo No
Request information or menu options or helpHelp Ajuda
Abort and go to main menu of selected sourceAbort / Exit / EscapeAnullar/Sortir
Repeat last function or commandRepeat Repetir
Confirm (accept top or marked candidate from item list)O.K. Acceptar
Go back one item in the list, or replay previous messagePrevious / BackAnterior
Go forward one item in the list, or play next messageNext Segent
Delete an entry, message or list itemDelete Esborrar
Send a messageSend Enviar
Select an option / functionOption / Function Opci/Funci
Quit the applicationQuit Sortir
Save/archive current entry, message or list itemSave Memoritzar
Return to main menuMenu / Main menuMen/Men principal
Correct last entryCorrect Corregir
Continue a stop operation or proceed with next itemContinue Continuar
Select a name, message or optionSelect Seleccionar
Add or insert an entry (user-defined name and number)Add / Create /NewAfegir/Crear/Nou
Record a message or a voicemail greeting or an information fileRecord Gravar
Go one menu level upUp a menu Men superior
Go to the end of the list or voicemail messagesEnd / LastFinal de la llista/ Fi
Go to the beginning of the list or voicemail messagesStart / First / TopPrincipi/ Inici
Play out message or information filePlay / Listen toReproduir
Play againReplayRepetir
Select language of service (native language of database)English Catal
Modify an entryChange / ReviewCanviar/Revisar
Enter spelling mode (any alpha numeric input)SpellLletrejar
Go to next pageNext page / Page upPgina segent
List directory of names, messages or programming optionsListLlista
Enter directory sub-menu or list directory entriesDirectory Directori
Go to previous pagePrevious page / Page downPgina anterior
Pause the current operationPause Pausa
Connect to ConnectConnectar
Activate a functionActivateActivar
Reset idle modeResetRestablir
Menu listMenu listMen
Program advanced features or options, or enter a sub-menu Program Programar
Go one menu level downDown a menu Men inferior
Deactivate a functionDeactivateDesactivar
Select automatic modeAutomatic Automtic
Select manual modeManual Manual
Select languagesLanguage Idioma
Specific message service functionsEnglish exampleCatalan word
Enter voicemail menu programVoicemail Bstia de veu
Enter email / mailbox menu programEmail Correu electrnic
Read a mail / messageRead Llegir
Answer to a messageAnswer Respondre
Enter agenda programAgenda / Address bookAgenda
Enter Internet programInternetInternet
Receive new e-mails / messagesGet mail Obtenir correu
Enter a password to access voicemail functionsPassword Contrasenya
Enter Short Messages ServiceSMSSMS
Enter fax menu programFax Fax
Transfer/Forward messageTransfer / ForwardReenviar/Transferir
Dictate text messageDictate Dictar
Go to a siteGo toAnar a
Ask for information on mail headerMail headerEncapalament
Declare message as urgentUrgent Urgent
Multimedia Service CommandsEnglish exampleCatalan word
select a telematic serviceTelematics serviceServei telemtic
search for telematic servicesShow servicesLlista de serveis
Request departure timesDepartures Sortides
Request parking information (availability)Parking (information)Aparcaments
International InternationalInternacional
Send credit card number / Payment with a credit cardCredit cardTargeta de crdit
Car Products KeywordsCar radio English exampleCatalan word
Select tuner (and play last station)Radio Rdio
Select cassette (and play last track)Tape Casset
Resume playing actual sidePlay Reproduir
Select CD-changer (and play last CD)CD changer / CD playerCarregador de CD/CD
Quit playing, skipping or rewindingStop Parar
Select CD-changer and specific CD by name or numberCD CD-NAME / CD-NUMBERCD nmero
Select particular track on actual CD Track TRACK-NUMBERPista
Select pause modePause Pausa
Play next track on actual cassette/CDNext track Pista segent
Play previous track on actual cassette/CDPrevious track Pista anterior
Scan available radio stationsScan Buscar emissores
Directly select tuner and specific radio stationPlay + ACRONYMPosar
Enter volume controlVolume Volum
Reverse side of the cassetteTurn over / Reverse sideGirar/Canviar de cara
Scan CDScan CDExplorar CD
Select random / shuffle modeRandom playSelecci aleatria
Ask for stored traffic information messages (TIM)Traffic messagesMissatges de trnsit
Fast forwardFast forwardAvanar
RewindRewind Enrere
Select FM BandwidthFM radioFM
Eject cassetteEject Expulsar
Enter bass controlBass Greus
Enter treble controlTreble Aguts
Select DAB (digital audio broadcasting) BandwidthD.A.B.D.A.B.
Navigation SystemEnglish exampleCatalan word
Select navigation (and display menu)Navigation Navegaci
Repeat last acoustic messageAgain / Repeat / last messageRepetir/ltim missatge
Enter destinationEnter destinationIntroduir destinaci
Directly guide to pre-stored destinationGuide to + DESTINATIONGuia a
Enter a town nameTown / CityCiutat/Poble
Enter a Street nameStreet Carrer
Enter AirportAirport Aeroport
Enter City centerCenter Centre
Enter Gas / Petrol stationGas stationBenzinera
Enter Car service (garage)Garage Taller mecnic
Select guidance (and display menu)Guidance Guiat
Acoustic guidance ON/OFFAcoustic guidanceGuia acstica
Zoom out of the mapZoom outAllunyar
Zoom in the mapZoom in Apropar
Enter name of a Crossing StreetCrossing Crulla
Enter house numberHouse-number Nmero de carrer
Select pre-stored destinationDestination listLlista de destinacions
Enter HotelHotel Hotel
Enter RestaurantRestaurant Restaurant
Toggle to map-modeMap Mapa
Enter hospitalHospital Hospital
Enter Railway stationRailway stationEstaci de tren
Activate display controlDisplay Mostrar en pantalla
Change a destination itemChange Canviar
Return to starting pointGo backRetrocedir
Calculate routeStart route guidanceComenar ruta
Enter Border-pointBorder Frontera
Enter Highway exitHighway exit / Motorway exitSortida d'autopista
Select last destinationLast destinationdestinaci prvia
Show distance to destinationDistanceDistncia
Show calculated route listRoute listLlista de rutes
Enter FairFair / Trade showFira/Exposici comercial
Enter FerryFerryTransbordador
Enter Highway crossingHighway crossing / Motorway junctionIntersecci d'autopistes
Show destination mapDestination mapMapa de destinaci
Show POI (point of interest) in the mapShow + POIMostrar
Hide POI in the mapHide + POIOcultar
Show actual positions mapPosition mapMapa de posici
Enter Car rental stationCar rentalLloguer de vehicles
Enter other destinationsOther destinationsAltres destinacions
Enter Highway service stationHighway service stationEstaci de servei d'autopista
Calculate alternative routeAlternative routeRuta alternativa
Toggle to pictogram modePictogram Pictograma
Specific car accessories functionsEnglish exampleCatalan word
Enter air-conditioning menuAir-conditioning Aire condicionat
Give the timeTime Hora
Windows upUp Pujar
Windows downDown Baixar
Recall driver settingsSettings / RecallConfiguraci/Memoritzar
Give the dateDate Data
Enter car-checking menuControl / DiagnosticDiagnstic/Control
Set cabin temperatureTemperature Temperatura
Enter windows menuWindows Finestres
Enter ACC (adaptive cruise control) menuACC ACC
Defrost DefrostDispositiu antigla
Air re-circulation Re-circulationRecirculaci
Give climate informationWeather Temps
Enter seat menuSeat Seient
Choose the air-conditioning levelLevel Nivell
Choose air flow/Fan/Ventilation/Blower levelVentilationVentilaci
Set desired speedSpeed Velocitat
Generic wordsEnglish exampleCatalan word
SetSetConfigurar
OnOnEncendre
OffOffApagar
ResetResetReiniciar
OpenOpenObrir
CloseCloseTancar
LeftLeftEsquerra
RightRightDreta
FrontFrontDavant
RearRearDarrere
High levelHighSuperior
Low levelLow Inferior
.Table 7- Application wordsAs can be seen in the above tables, some descriptions were translated with more than one word (Telfon, mbil, GSM) , and some words were used to describe different concepts ( agenda describes Address book and Phonebook).
The specifications allows 39 different words for the Mobile phone application words, 65 different words for the IVR function keywords, and 96 different entries for the Car product keywords. In addition, the item language-dependent application words (corpus codes P1-2) allows for 10 additional words, giving a total of 210 different words.Words were grouped as follows Table 8Mobile Phone Application WordsIVR Funcions KewywordsCar Products Keywords
MbilCancellarAire_condicionat
MarcarSHora
RemarcarNoPujar
TrucarAjudaBaixar
AgendaAnullarConfiguraci
Acceptar_trucadaRepetirData
Nmero_de_telfonAcceptarDiagnstic
CasaAnteriorTemperatura
PenjarSegentFinestres
OficinaEsborrarACC
Rebutjar_trucadaEnviarDispositiu_antigla
Remarcaci_automticaOpciRecirculaci
SecretriaSortirTemps
EmergnciaMen_principalSeient
Remetre_trucadaCorregirNivell
OperadoraContinuarVentilaci
PrivadaSeleccionarVelocitat
InformaciAfegirRdio
PrefixGravarCasset
EmpresaMen_superiorCarregador_de_CD
OpcionsFinal_de_la_llistaParar
AccsPrincipiCD_nmero
SeleccionarReproduirPista
CodiCatalPausa
HorariCanviarPista_segent
Missatge_d'esperaLletrejarPista_anterior
ltimes_trucadesPgina_segentBuscar_emissores
Trucades_perdudesLlistaPosar
TonsDirectoriVolum
Mans_lliuresPgina_anteriorGirar
SilencisConnectarExplorar_CD
Trucades_rebudesActivarSelecci_aleatria
DuradaRestablirMissatges_de_trnsit
BuscarMenAvanar
PerfilProgramarRetrocedir
ConfernciaMen_inferiorFM
En_esperaDesactivarExpulsar
TelfonAutomticGreus
Nmeros_anteriorsManualAguts
IdiomaD.A.B.
Bstia_de_veuConfigurar
Correu_electrnicEncendre
LlegirApagar
RespondreReiniciar
InternetObrir
Obtenir_correuTancar
ContrasenyaEsquerra
SMSDreta
FaxDavant
ReenviarDarrere
DictarSuperior
Anar_aInferior
EncapalamentNavegaci
UrgentIntroduir_destinaci
Servei_telemticGuia_a
Llista_de_serveisCiutat
SortidesCarrer
AparcamentsAeroport
InternacionalCentre
Targeta_de_crditBenzinera
CrearTaller_mecnic
FunciGuiat
TransferirGuia_acstica
RevisarAllunyar
MemoritzarApropar
Crulla
Nmero_de_carrer
Llista_de_destinacions
Hotel
Restaurant
Mapa
Hospital
Estaci_de_tren
Mostrar_en_pantalla
Enrere
Comenar_ruta
Frontera
Sortida_d'autopista
destinaci_prvia
Distncia
Llista_de_rutes
Fira
Transbordador
Intersecci_d'autopistes
Mapa_de_destinaci
Mostrar
Ocultar
Mapa_de_posici
Lloguer_de_vehicles
Altres_destinacions
Estaci_de_servei_d'autopista
Ruta_alternativa
Pictograma
Exposici_comercial
ltim_missatge
CD
Table 8- List of Application Words3.1.2 Language-dependent application words P1-2
A list of 10 language dependent application words were chosen to complete the set of words described in Table 7 and not included in Table 8. Each speaker pronounces two words with corpus code P1-2. The set of 10 words is the following:Secretari
SOS
Fi
GSM
Inici
Nou
Control
Canviar_de_cara
Remarcaci
Poble
Table 9- Language dependent application words3.2 Voice activation keywords A1-2
Some additional command phrases are necessary to activate the recognition system. A short item may be undetected if spoken in noisy conditions or inside a whole phrase. To ensure that the system will detect the words, we need a short sentence instead of a command word. These words are in CC A1-2
The keywords to be used for voice activation are the following ones with their corresponding Catalan selected word :True Hands-free Telephony FunctionCatalan word
Making a phone call (by name or number)Trucar per telfon
Terminating a phone callAcabar la trucada
Dialing by numberSeleccionar un nmero
Dialing by nameSeleccionar una persona
Answering the incoming callContestar la trucada
Table 10- Voice activation keywords
3.3 Isolated digits
3.3.1 Single digits I1-4
Four isolated digit are elicited. Digits are READ. Corpus code is I1-I43.3.2 Digit string B1
Each speaker pronounces a different digit string. The string contains the 10 digits randomly ordered. The speaker reads the following instructions: Please, read these digits carefully with pauses among them. Star and hash are not included in the digit string. The corpus code is B1.
3.4 Connected digits
3.4.1 Sheet number C1
Six digits are used to numerate the sheets. The number is composed by 4 digits that indicate the session number and 2 checksum digits for a Hamming code. Detects and corrects one error.
Format: d1d2d3d4c1c2 From 0000xx to 2991xx
d1d2d3 Numbers from 000 to 299
d4 {0|1}
c1= (d1+d2+d3+d4)mod 10
c2= (d1+ 2d2+ 3d3+ 4d4)mod 10
The corpus code is C1
3.4.2 Telephone number C2, C5-C7
3.4.2.1 Spontaneous telephone number
Corpus Code C2 contains a spontaneous telephone number. In order to get it, the speaker gives a telephone number known by her/him.
3.4.2.2 Read telephone numbers
This is a 9-11 digit READ telephone number. The number of digits, spacing and presentation reflects typical telephone numbers in Catalan for national numbers including area codes and GSM codes but not local or international numbers e.g.
Fixed telephone numbers:
93 401 64 50
934 016 440
0 934 016 440
00 934 016 440
Mobile phone numbers
600 56 40 98
600 564 098
Each speaker read 3 tel. numbers. The corpus codes are C5-C7
3.4.3 Credit card number C3
Corpus code C3 contains a READ 16 digit credit card number. The list of the credit car numbers follows0500 8824 0710 27770490 2332 8674 12480510 0620 7667 3063
0480 8636 0800 98820520 2592 0833 44440470 1191 0364 9173
0530 9627 0888 92580460 6894 1601 20550540 2622 7895 8262
0455 3633 2402 89940555 0744 1721 01540440 1681 6295 4013
0560 0773 8387 02500430 9686 2602 62540570 3367 1735 1512
1501 0330 6973 40961491 1765 7405 20821511 2712 1274 1075
1481 6604 8812 70251521 3892 1201 58171471 0873 9933 1177
1531 1611 2666 75891461 2722 0290 71711541 8314 6395 9249
1456 9912 9725 20631556 8287 3613 96501441 2682 2366 3160
1561 2672 3583 97961431 1371 7635 17541571 9779 4095 9865
2502 0890 0300 68812492 2632 6096 22662512 7386 4186 5995
2482 1671 9896 19972522 6624 2093 11662472 0600 0864 3990
2532 3623 3185 31402462 1183 1811 98532542 8397 6634 5138
2457 3423 0670 00892557 1781 1365 70732442 8369 8616 5198
2562 1665 7368 22622432 0644 8695 86972572 0165 1591 0008
3503 3393 9315 67633493 1581 8668 93563513 3373 0400 6984
3483 1331 0764 70823523 3334 8802 89833473 9607 8626 0064
3533 0610 8714 67443463 2612 6195 02833543 6385 7875 5345
3458 0264 0590 72683558 8324 9325 79983443 1135 8304 2171
3563 7673 0320 95603433 9963 0234 10293573 3383 0220 7985
4504 0390 1291 78534494 9796 0420 00554514 1311 2322 9472
4484 7196 1211 90264524 0380 0110 98334474 2312 1301 6991
4534 2372 0664 46304464 8876 0680 98704544 0280 7396 7065
4459 7097 0823 01764394 0410 9705 21534384 1791 8685 1713
4294 2292 9923 20434284 1691 7296 90594194 9298 6775 0026
5505 8098 0844 10525495 8834 2184 64605515 1391 6594 8152
5485 2302 2692 82205525 9835 7188 41565475 6874 7286 9048
5535 1381 7615 90785465 8224 8297 91885545 0273 1421 8169
5683 8795 6614 00755693 3293 2392 76775783 2275 7776 0144
5793 1701 8278 94545889 1281 0370 32645893 9803 8865 3174
6506 9675 8376 95556496 8724 0754 78846516 7335 1235 2121
6486 6404 0034 10806526 1631 1265 80576476 8606 3276 2992
6536 8845 7277 82616466 0700 0310 81466559 7694 0790 8062
6564 0720 9225 90626434 1035 7303 35576574 9398 1755 7160
6424 8785 1645 10486584 7723 7974 91586414 8202 3193 8083
7507 9944 9407 53587497 3792 1801 93407517 3782 0780 8040
7487 7313 2382 20797527 9305 1745 78077477 1125 7605 2163
7537 0100 1321 86607467 6794 0134 99827565 2582 9696 2141
7435 7375 1401 91477575 8197 9198 91747425 0654 0190 0241
7585 9388 0900 90867415 7323 9617 90307595 7713 8704 9267
8508 3403 8855 20028498 9203 8189 98498518 9786 0182 6089
8488 6285 1411 30778528 7703 7684 36808478 2412 9825 4083
8538 2422 9669 38908468 6187 9288 30818566 0210 7625 9584
8436 2192 8778 30818576 3593 9099 91648426 1774 0124 3469
8586 1621 0580 08368416 1114 0854 93158596 9377 3413 7579
9509 2891 9715 59859499 3283 9213 93609519 9115 0810 0016
9489 9101 9813 49439529 0630 0091 00439479 1092 6374 8172
9539 3094 8212 91339469 0200 2702 12639567 9279 9901 9649
9437 3603 0690 60689577 1711 2223 73769427 9954 0734 7654
9587 2282 8406 10659417 6784 1221 80769597 8975 9976 9998
Table 11 Credit Car numbers
3.4.4 PIN code C4
The PIN code is a connected digit string of length 6, similar to the sheet number, but drawn from a set of 150 such numbers. The corpus code is C4. Table shows the list of PIN numbers000100002003004005006007011012013014
015016019021022023024025026027030310
032033034035036037041042043044045046
049050051052053054057058059061062063
066067068069070710072073076077081082
085086087088089090091092095096097098
099111112102115116117118119120122123
127128129132133134137138139140141420
143144147148149152153154155156159160
161621163164167168169172173174175176
181820183184185186193194195196199201
222300224211226227229214236237248249
255256257217265266272730283281284285
293290296297298299333411336302337322
338339347348354352355356364365366367
374370378379384381389390395393398399
444540447415449404456422457441466443
467460469463475472485480486481488489
495492496497498499555601557500559523
565611566502567524568562569563576536
577578579580586581587583596592666700
668605669612676613677614678615679662
687673688633689681697682698690699722
777811779733787702788723789745798785
799801888922898804999800958858907807
Table 12 PIN Numbers3.5 Dates D1-3
3.5.1 Spontaneous date
Corpus code D1 contains a spontaneous date. Is the birth date of the speaker.
3.5.2 Prompted date
Corpus Code D2 contains prompted dates. The prompted date expression is Week-day, dd de Month de yyyy and is presented in the prompt sheet in the conventional Catalan way.
Example: Divendres, 25 de Gener de 1997
Each speaker pronounces a different date.
Dates include:
Years from 1960 to 2037
Days from 1 to 31 and words contained in the following list
abrilAprilagostAugustdesembreDecember
febrerFebruarygenerJanuaryjuliolJuly
junyJunemaigMaymarMarch
novembreNovemberoctubreOctobersetembreSetember
dissabteSaturdaydiumengeSundaydivendresFriday
dijousThursdaydimecresWednesdaydimartsTuesday
dillunsMondayianddof
deof
Table 13- Words in prompted dates3.5.3 Relative and general date expression
Relative and general date expressions, D3 are typically spoken in real applications. The list of sentences is:
abans d'ahirBefore yesterday
ahirYesterday
avuiToday
demTomorrow
dem passatThe day after tomorrow
la setmana vinentNext week
el mes que veNext month
el mes passatLast month
la setmana passadaLast week
el proper cap de setmanaNext weekend
a mitjans de la setmana passadaMid last week
la propera setmanaNext week
Table 14- Sentences in relative dates3.6 Embedded application word phrases E1-2
E1-E2 is the CC of a set of phrases that contain embedded application words to provide a basis for word-spotting tests, and also as a source of data which more accurately reflects spontaneous production of application words.
Examples
Seleccionar el CD nmero set
Trucar a una persona
3.7 Spelled names/words L1-7
Spelling is not common in Catalan and people are not used to spell words. Text to spell were shown to the speaker in capital letters without accent or other symbol.
The following table shows the letter symbol from a Catalan dictionary, their usual name, the alternative name, if any, the expected counts and the counts at transcription level. Note that Counts at transcription level are higher because spontaneous spellings were not taken into account in the expected counts.LetterNameAlternative NameExpected counts Counts
Aa21852704
Bbebe alta683770
Cce9691091
ce trencada403403
Dde911985
Ee20892456
Fefa488507
Gge727851
Hhac404422
Iii llatina19502178
Jjota414437
Kca378378
Lelaele15641826
Memaeme859979
Nenaene14261593
enya339341
Oo17452011
Ppe762863
Qcu395406
Rerraerre13981754
Sessaese16431846
Tte11861306
Uu13451464
Vve baixauve561625
Wve doble366366
Xics406411
Yi grega340353
Zzeta316427
Table 15-Letter symbol, names, expected counts and counts at transcription level. Although is not a Catalan letter, it is included because some of the common Catalan surnames come from Spanish, and on some of them is contained.
3.7.1 Spontaneous name
L1 contains the spelling of a spontaneous name. The name of a friend was asked to the speaker in CC O1 and later on spontaneously spelled.
3.7.2 Prompted name linked to city
L2 contains the spelling of the city pronounced in O3
3.7.3 Real names/words
Corpus codes L3, 4, 5, 6 contain spellings of words. These words have a big variability to achieve more different letters.
3.7.4 Artificial name
L7 is a spelling composed by letters poorly represented in the above mentioned spellings. Its used to compensate the number of realizations of the recorded letters
3.8 Money amount M1
M1 contains money amount. Euro (Euro) and cents (cntims) are included. Formats are:
sis-cents setanta-un euros i dotze cntimsset-cents tretze mil setanta-quatre euros3.9 Natural number N1
N1 contains a read natural number between 100000 and one million
Format is:
set-cents setanta-sis mil quatre-cents dos3.10 Directory assistance names O1-7
3.10.1 Spontaneous forename
O1: Forename of a friend
3.10.2 Spontaneous city name
O2: City of growing up
3.10.3 City name (set of 150)
O3-4 Include Catalan cities and other European cities and countries
AlemanyaustriaBlgicaDinamarcaEspanya
FinlndiaFranaGrciaIrlandaItlia
LuxemburgNoruegaPasos BaixosPortugalRegne Unit
RssiaSuciaSussaParsLondres
CopenhaguenMadridHlsinkiEdimburgSevilla
BerlnRomaAtenesBrussellesMil
MunicRotterdamLisboaVienaEstocolm
GinebraDublnMoscouOsloLi
GlasgowEspooMarsellaOdenseTampere
HamburgTessalnicaPatresNpolsBruges
ArhuslabaAlacantAlbaceteAlmeria
AstriesvilaBadajozBarcelonaBiscaia
BurgosCceresCadisCantbriaCastell de la Plana
Ciudad RealConcaCrdovaLa CorunyaGirona
GranadaGuadalajaraGuipscoaHuelvaIlles Balears
JanLleidaLleLugoMlaga
MrciaNavarraOscaOurensePalncia
Las PalmasPontevedraLa RiojaSalamancaSanta Cruz de Tenerife
SaragossaSegviaSriaTarragonaTerol
ToledoValnciaValladolidZamoraHospitalet de Llobregat
BadalonaSabadellTerrassaSanta Coloma de GramenetMatar
ReusCornell de LlobregatSant Boi de LlobregatManresaEl Prat de Llobregat
RubViladecansGranollersCerdanyola del VallsVilanova i la Geltr
Sant Cugat del VallsEsplugues de LlobregatMollet del VallsCastelldefelsGav
Sant Feliu de LlobregatSant Adri de BessFigueresIgualadaVic
TortosaRipolletVilafranca del PenedsBlanesOlot
Montcada i ReixacSant Joan DespBarber del VallsPremi de MarEl Masnou
VallsEl VendrellMolins de ReiSant Pere de RibesSant Andreu de la Barca
Santa Perptua de MogodaPineda de MarMartorellSant Feliu de GuxolsCambrils
PalafrugellManlleuSitgesLloret de MarAmposta
Table 16- City names
3.10.4 Company/agency name/street name (set of 150)
O5-6. Include a list of brands and company namesABCAbertis
AcesaACS
AdidasAENA
Agncia EFEAiges de Barcelona
AirEuropaAlcampo
AlcatelAldi
Al-PiAltadis
AmadeusAmena
Antena 3 TVApple
AunacableAvis
AvuiBanc Sabadell - Atlntic
BanestoBankinter
BankpymeBarclays Bank
BayerBBVA
CritasCadena 100
Cadena SERCaixa de Catalunya
Caixa de GironaCaixa de Manresa
Caixa LaietanaCaixa Peneds
Caixa PopularCaixa Sabadell
Caixa TarragonaCaixa Terrassa
Caja MadridCanal +
CanonCarrefour
CepsaChupa-Chups
CinesaCitron
Coca-ColaCompaq
COPECorreus
Creu RojaDanone
Deutsche BankDiari de Barcelona
DuracellEDreams
El Corte InglsElectronic Arts
El MundoEl Mundo Deportivo
El PasEl Peridico
El PuntEndesa
EpsonEricsson
Europa PressFecsa-Endesa
FiatFibanc
FnacFord
Gallina BlancaGas Natural
Grup BalaEroski
FerrovialInditex
NutrexpaPuig
RocaUnilever
PRISAHalcon Viajes
HondaHP
IbriaIberdrola
IberojetIBM
IkeaIndra Sistemes
ING DirectIntel
JazztelKodak
La CaixaLa Razn
Lauren FilmsLa Vanguardia
MangoMenta
Metro BarcelonaMicrosoft
MotorolaMoviStar
Nez y NavarroNescaf
NestlNH Hotels
NissanNokia
OnceONO
OpelPepsico
PetrocatPeugeot
PhilipsRdio Barcelona
Radio Club 25Renault
RenfeRepsol YPF
RetevisinSamsung
San MiguelSanyo
SeatSiemens
Sol MeliSony
SpanairSport
TeleCincoTelefnica
TelepizzaTerra
ToshibaTrasmediterrnea
Uni2Uno-e
Viatges MarsansVilaWeb
VodafoneWanadoo
WolkswagenZara
Table 17- Company names
3.10.5 Forename & surname (set of 150)
Each speaker pronounces a forename + surname in the CC O7. The complete list is composed by 150 items and is shown bellow
Aida VerdenyAntoni GasaEloi Benaiges
Joan Josep FeixasFtima SacauJana Barcons
Martina PuigdemasaMaria Merce VivetPau Gallart
Jaume SarrocaYasmina SolanaMar Burgues
Francesc Josep PortaGemma MirabetIvet Escales
Sandra PuenteGeorgina BautistaEdgar Borrull
Hug SuriolMaria BernadetsBiel Padilla
Sofia JoveAlba NollaJudith Turmo
Josep Mara TohaManel CastellvMiguel Jimnez
Susana SalgadoBeatriu Foresngel Sol
Daniela PortellaCarlota PedarrosOriol Goma
Abril PetitGabriel BetriuLluc Arnalot
Tania CastellCrstian ArtigasSergi Tubau
Samuel BertranManela FalcoIvan Gangolells
Gloria MolinJlia EspasaXavier Mercad
Guillem AltarribaAitana BelletAlicia Barrera
Lara MasesDaniel ViloAina Rota
Pol LladNora RibaDiego Salvans
Adam EspaolNdia LloveraIns Monne
scar AmpostaHctor ReLorena Moix
Abel AbellMariano Perpiric Segarra
Anna SegusAlexandra BruguesMnica Descarrega
Neus MasdeuRubn RoyoAnna Maria Sabata
Mriam LorenzoMar FolqueCristina Cabau
Helena CugatToni VilaltaDavid Auguets
Sebasti CasellesRuth AlegretNolia Torra
Ona SamperCarolina FierroAdri Vilarrasa
Sonia Guimerangela BlasiAlxia Altes
Domingo FitAdriana EstanyJuli Puigdevall
Aitor SirventSara CarlesMiquel Arro
Estebe RavetllatEmlio ToldrMaria Del Carme Orriols
Joan Antoni FustMireia EsquerreElena Vilagines
Blanca PegueraMarina SentisAinara Badosa
Albert PeruchoArnau LlopartIsmael Marqus
Carles TrillaMariona CastellarnauXnia Izquierdo
Lidia RosellGisela FrancoPere Anguera
Laia PuigvertRicard BolduPatricia Espot
Josep Antoni MargalefBerta PerisMaria Del Pilar Virgili
Josep Lluis MitjanaIngrid RuanaJan Verdaguer
Meritxell CerquedaAinhoa PepiolRamn Benavent
Elsa SanglasVicen AlaaIsaac Moreso
Josep Manel RicartSantiago FarrMax Prats
Nayara ViasToms SaforcadaBernat Espuny
Mara Jos MuntCludia RafolsAriadna Morros
Noa GalitoAaron NogueraGens Batalla
Ferran AltadillEsther PouIris Raich
Ddac PubillNoem BoixVctor Mota
Raquel SilvaClia CunillEmma Panisello
Rafel RodaNatlia CasesMario Trepat
Ignasi GodiaAleix PeretRosario Boixader
Table 18- Forename & surname
3.11 Phonetically rich sentences S1-9
Each speaker pronounces nine sentences CC S1-9 from a set of phonetically rich sentences. Every set of nine sentences was designed to contain each phone at least once. Next Table shows phone counts at prompt and transcription level AllophoneCounts (promt)Counts
(transc)
@2424924099
a96899695
B21832170
b35563241
D16771679
d49444920
e51765208
E37433708
f20091997
G10451043
g12741253
i1143011334
j637640
J637635
k73197270
L11811170
l90499013
m63886360
N658655
n1020810151
O32193184
o34323535
p53355456
r63506322
rr37383734
S1115781
s1308613023
t94579053
u96749538
w20502021
Z15141485
z14961464
Table 19- Phone frequencies of the phonetically rich sentences at prompt and transcription level of the close-talk microphone
3.12 Times T1-2
3.12.1 Spontaneous time
Each speaker says the current time CC T1
3.12.2 Read time phrase
CC T2 contains a read time phrase
Format examples:
Hora1: {l|les} {1-12} {i|menys} {deu,quart,20,25,mitja} {del mat, de la tarda, del migdia, del vespre, de la mitjanit, de la nit}
Hora2:{un quart|dos quarts|tres quarts}{i mig}{d|de}{1-12}{del mat, de la tarda, del migdia, del vespre, de la mitjanit, de la nit}Hora 3: {l|les} {1-12} en punt {del mat, de la tarda, del migdia, del vespre, de la mitjanit, de la nit}
{es|son}{aprox|exac|gaireb} {hora1|hora2|hora3} {avui, dem, ahir} a {hora1|hora2|hora3} {aprox|exac|gaireb} a {hora1|hora2|hora3}{dahir, davui, de dem} {2-12} hores i {2-59} minuts
a l'una i vint-i-cinc de la nit d'ahir exactament a les tres menys cinc de la matinada de demCatalan wordEnglish wordcounts T2counts TL
lthe4941
matmorning6770
tardaafternoon6981
menysto70123
migdianoon7070
mitjanitmidnight7069
nitnight7070
enin7379
puntoclock7379
mitjahalf73107
gairebalmost7575
vespreevening7676
matinadadawn7878
snits86214
aproximadamentapproximately8786
exactamentexactly8989
horeshours100101
minutsminutes100162
quartquarter101146
quartsquarters118213
avuitoday119119
demtomorrow121121
ahiryesterday148148
mighalf149153
d'of198223
delof213219
lesthe270670
lathe287319
aat388388
iand400707
deof552664
Table 20- Words in Time phrases. Catalan word, English word, counts at corpus level of the T2 set and counts at transcription level of T1 and T2. Digits and numbers are not included in the table.3.13 Phonetically rich words W1-4
Each speaker utters four words from a set of 2400 phonetically rich words CC W1-4. The following table shows their phone frequencies at prompt and transcription level of the close-talk microphone
AllophoneCounts
(prompt)Counts
(transc)
@21552117
a831834
B301293
b305228
D300299
d335332
e323319
E624606
f251246
G300297
g303297
i13311314
j298294
J300296
k743734
L307299
l633621
m605592
N301297
n835814
O265259
o300300
p568594
r919910
rr445435
S300284
s11341121
t798779
u16131591
w310302
Z299292
z302297
Table 21- phone frequencies at prompt and transcription level of the close-talk microphone
3.14 Spontaneous sentences Z0-9
10 spontaneous sentences with corpus codes Z0-9 were recorded in 200 sessions
For each one of the 10 sentences to be recorded, the operator asks to the speaker to make a sentence. The operator gives a topic to the speaker. Topics are described below. Items in () represent the type of data wanted, items in [] separated by | are alternatives, items in {} are optional, items in are parameters that have to be set.
Following each topic, it is shown the content of the LBR: label in the label files.
A: Teleservices
01) Voice-mail message to a friend (phone number, reason of call, etc.)
02) Phone number information service (name, town and address)
03) Interaction with an operator (calling telephone exchange)
04) Name retrieval by phone number
05) Call a travel agency and book a [train ticket | flight] (destination, date, time, type of train)
06) Call your bank [for account information | to transfer money to an account]
07) Call the theatre and ask for the seats available for a performance
08) Call a [book store | music shop] to enquire about the availability, price {and edition} of a [book | CD]
09) Call your own answering machine to check the messages
10) Dictate a short business letter via the mobile phone.
11) Tell the hotel that you will arrive very late but that you definetely want to take the reserved room
12) Ask the airport information for the latest flights to
13) Tell your speech-savvy mobile phone [to read [a fax | an email] | that you want to check your mailbox | to schedule an appointment with | organize a conference call with [ | your office] and a client]
B: Navigation
14) Describe the [current | a recent] traffic situation
15) Give [the police | rescue] a description how to get to your current location
16) Give the navigation system the coordinates of your favorite restaurant (name, address, city)