11742 neon hausa transcription guidelines (hau_asr002)-v10-20150904_0651

19
Projects Page of 1 19 11742 Neon Hausa Transcription Guidelines (HAU_ASR002) URL: Date: Author: Bushra Zawaydeh 04-Sep-2015 06:51 https://wiki.appen.com/pages/viewpage.action?pageId=40820112

Upload: tokkxylah

Post on 02-Dec-2015

32 views

Category:

Documents


9 download

DESCRIPTION

Hausa transcription guidelines

TRANSCRIPT

Page 1: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 1 19

11742 Neon Hausa

Transcription

Guidelines

(HAU_ASR002)

URL:

Date:

Author: Bushra Zawaydeh

04-Sep-2015 06:51

https://wiki.appen.com/pages/viewpage.action?pageId=40820112

Page 2: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 2 19

Table of Contents

1 Writing ____________________________________________________________________________ 4

1.1 Punctuation ___________________________________________________________________ 4

1.2 Capital letters __________________________________________________________________ 5

1.3 Numbers ______________________________________________________________________ 6

1.4 Abbreviations __________________________________________________________________ 7

1.5 Acronyms _____________________________________________________________________ 7

1.6 Initialisms _____________________________________________________________________ 8

1.7 Mixed Initialisms ________________________________________________________________ 8

1.8 Email and website addresses ______________________________________________________ 8

1.9 Fragments ____________________________________________________________________ 8

1.10 Interjections __________________________________________________________________ 9

2 Span Tags (highlighting) _____________________________________________________________ 10

3 Tags ____________________________________________________________________________ 13

3.1 Fillers _______________________________________________________________________ 13

3.2 Foreign words _________________________________________________________________ 13

3.3 Unintelligible Speech ___________________________________________________________ 14

3.4 No Speech ___________________________________________________________________ 14

3.5 Pause _______________________________________________________________________ 14

3.6 Speaker noises ________________________________________________________________ 15

3.7 Other noises __________________________________________________________________ 16

3.8 Truncations ___________________________________________________________________ 16

4 Less common tags _________________________________________________________________ 18

4.1 Overlapping speech ____________________________________________________________ 18

4.2 Speaker change (male < - > female) _______________________________________________ 18

4.3 Prompt ______________________________________________________________________ 18

4.4 Untranscribable _______________________________________________________________ 19

Page 3: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 3 19

The audio you will be listening to consists of recorded conversations in Hausa. This means that you will

hear the same speaker through more than one utterance and these utterances will be in sequence so that

the conversation makes sense.

Carefully read the guidelines below. Contact your supervisor if you have any questions about these

guidelines, as it is most important that you understand them and are able to use them correctly in your work.

Page 4: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 4 19

1 Writing

1.1 Punctuation

Do not use any sentence punctuation (e.g. full stops, commas, question marks).

You can use punctuation when it is required for a word to be acceptable (e.g. the apostrophe or hyphen)

Page 5: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 5 19

Examples of hyphen appear mainly in English words to which you will add the Hausa suffix.

America-wa

Guardiola-n

aunty-n

company-nunukan

lecture-cin

lecturer-in

As for the apostrophe, it is used for glottal stop sound which appears in words like the below:

wa'azi

ta'azi

sana'a

sa'a

The appostrophe is also used for the <'y> as in:

'ya'ya

'ya'yan

'yan

wa'yannan

wa'yansu

1.2 Capital letters

Name Entities (e.g. person names, place names, some time words) should be spelled with a capital letter as

per usual writing conventions for . Hausa

Page 6: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 6 19

Examples:

Correct Incorrect

Champions League champions league

christian Christian

Bompai bompai

If a business name is spelled with a capital letter in the middle of the word, this is okay.

Example

eBay

iPhone

YouTube

Do not use a capital letter if the only reason is that the word is at the start of a sentence.

Example of a sentence that does not start with a captial letter. Do not capitalize words sentence

initially unless the first word is a proper noun.

TRANSCRIPTION: yaya jiya yaya labari jiya ka je wurin to kua #um

In these examples, the first word is capitalized because it is a proper name:

TRANSCRIPTION: Allah ya jiya

TRANSCRIPTION: Bashir fa ya dan ne min zani na Bahijja

Use the name tag to highlight all names (see below)Span Tags (highlighting)

1.3 Numbers

Do not use any digits (e.g. 1 2 3 4 5 ...). All numbers must be spelled out as full words in the way they

.were pronounced

Page 7: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 7 19

Example - the number '2012' may be pronounced in many different ways:

2012 ==> dubu biyu da goma sha biyu

2012 ==> alif dubu biyu da goma sha biyu (the Arabized ‘alif’ especially in yearly dates rarely used)

1.4 Abbreviations

Do not use any abbreviations. Words must be spelled out in full.

Example from English:

Correct Incorrect

Saturday Sat don't write Sat or Sat. write Saturday, assuming they said the full word

Elizabeth

Street

Elizabeth

St.

notice here people will probably say "street" not "st". So you would write

the full word.

The only exception is if someone pronounces the word as an abbreviation.

Example

Appen Butler Hill Inc ==> Appen Butler Hill Inc (if the person pronounced 'Inc' as 'Inc', not

'Incorporated')

1.5 Acronyms

An acronym is a word made up of the first letters of other words that is spoken as a word (e.g. NASA, FIFA).

Acronyms are spelled using capital letters joined with no space.

Example

NASA

FIFA

Page 8: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 8 19

1.6 Initialisms

An initialism is an abbreviation made up of the first letters of other words where each letter is pronounced

separately (e.g. IBM, CPU, ADHD). Initialisms are spelled using capital letters joined by underscores.

Example

I_B_M

C_P_U

A_D_H_D

1.7 Mixed Initialisms

Mixed initialisms involve combinations of words, letters, and numbers. When a single concept is expressed,

all parts are written together with an underscore. Models like 4S (below) are written separately from the

brand name. Numbers in a proper name are capitalised when written out.

iPhone four_S

Seven_Eleven

A_K_forty_seven

M_P_three

1.8 Email and website addresses

If you need to transcribe an email address or website address, part of it may be a 'nonsense' word that does

not mean anything. To identify the nonsense word, add an underscore at the start of the word.

Example

www.pjojeou.com ==> W_W_W dot _pjojeou dot com

[email protected] ==> J_Smith at _pjojeou dot com

1.9 Fragments

When a speaker pronounces only part of a word, write that part of the word and attach a hyphen to it. We

call this a fragment.

Page 9: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 9 19

Example - someone begins to say 'motorcycle' but stops after 'moto'

she came to work today by moto- I mean car

Example: someone begins to say 'onions' but stops after 'on-' and then repeats the word in full

my eyes hurt when I cut on- onions

Make sure there is a space after the hyphen.

If it is not clear what the full word was going to be, do not transcribe the word and instead use the

tag (see below ).Unintelligible Speech

1.10 Interjections

Interjections are very common in spoken Hausa, but strictly speaking they are not 'words' and would be

unlikely to show up in a dictionary or a newspaper article. You should write all interjections and spell them

as per the table below.

Description Sounds like ...

Agreement (yes) eee, mm, mhm, ooo

Disagreement (no) a-a, m-m

Page 10: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 10 19

2 Span Tags (highlighting)There are two types of tags: tags (colored) and tags (grey). Look for these in the screenshot span event

below.

Event tags are inserted between words, while span tags are used to highlight words.

Span Tag How to use it

Use this to highlight any foreign words you can understand, but

This tag should not be used for are to you. completely unknown

foreign names (places, businesses, personal names).

See below.Foreign words

English loanwords are words from English.borrowed

They are considered foreign words for the purposes of this NOT

project and they should receive a foreign span tag NOR should not

they be replaced with a foreign tag. Please spell all English words in

English.

Example:

Correct Spelling Incorrect Spelling

captain kyaftin

Page 11: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 11 19

Span Tag How to use it

Arabic loanwords are words from Arabic.borrowed

They are considered foreign words for the purposes of this NOT

project and they should receive a foreign span tag NOR should not

they be replaced with a foreign tag. Please spell all Arabic words in

Latin script, using guidelines written in the Spelling Guidelines

document.

Example:

Correct Spelling Incorrect Spelling

sallallahu s`allallahu

subhan allahi subhanallahi

Use this to highlight any words that are classified as interjections.

Interjections are words that express emotions and reactions and are

very common in spoken Hausa, but are unlikely to show up in a

dictionary or newspaper article.

For example in English if someone is surprised they may say "ooh".

Some Hausa examples are provided above (see )Interjections

Use this to highlight any words that were accidentally mispronounced

by the person speaking.

Spell the word in the (correct) way, then highlight it.normal

There is no need to use this if someone has an accent - it should

only be used when the person accidentally said something the wrong

way.

When in doubt ask yourself "would this person pronounce the word

differently if I asked them to repeat themselves?"

If they would, it can be classified as a mispronunciation.

Use this to highlight any words that you are not sure how to spell.

This should not be used often, because you have spelling guidelines

and you can search Google or Bing for the names of people and

places.

Page 12: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 12 19

Span Tag How to use it

Use this to highlight names, names, andperson place brand /

/ names.product / organization / business movie

Not all words that are capitalized are names.

You do need to tag adjectives of NOT nationality, holidays, days of

, since these words do not denote the week, months of the year, etc.

a person, place or company name.

Page 13: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 13 19

3 Tags

3.1 Fillers

Fillers are the sounds people make while they are thinking of what to say next:

Choose the tag which most closely resembles the sound the speaker makes when hesitating.

Example: speaker says "ee" after some noise:

gun budurwa zai je man

3.2 Foreign words

You may hear someone speaking in a foreign language. If you cannot understand the foreign speech, just

place a "foreign" tag in place of the words you cannot understand.

Example: "ariyo nono achiel ariyo" ==> you would tag this as

If someone uses just the occasional foreign word and you know how to spell it, write out the word and then

highlight it using the "foreign word" highlighting tag. See above.Span Tags (highlighting)

Example: the words in bold should be highlighted because it is not Hausa

Hausa ich spreche

Note, foreign names (people's names, place names, festival names, etc.) do NOT constitute foreign words

and should be spelled. If you are unsure of the spelling, you can make your best guess and highlight it. If a

foreign name is particularly difficult to spell, you could search for it in to find the most common Google

variant of spelling.

Similarly, you must consider whether the 'foreign' word is in fact a 'loanword', meaning that it could be

considered part of the Hausa now. If a word of foreign origin is commonly used and/or understood by

speakers (or a community of speakers) in the Hausa you are transcribing, it should be transcribed.

Page 14: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 14 19

It is very important that we are consistent in the treatment of , so when in doubt, loanwords

choose to spell the word and highlight it with the loanword tag (English or Arabic) rather than using

the 'foreign' tag to replace it. (see above).Span Tags

3.3 Unintelligible Speech

If you come across a word or several words that are not clear because there is interference, audio

problems, or because the person is not talking clearly, enter this tag in place of the unintelligible speech.

Of course you should try your best to listen and determine what was said, but in natural speech there will be

unintelligible words often. As a guide you should try at least three times to understand what was being said.

If it is not clear, insert the tag and move on.

Example - speaker mumbles something after "tun" and then continues speaking. The

is some unintelligible speech.

oke ba shi ke nan tun mu bare haka zuwa an jima zai yi wannan magana zai dinga

fita sosai mu ci gaba haka in ka na ji na dai mu ci gaba haka

3.4 No Speech

If an utterance contains no speech (e.g. there is only silence or noises) insert the entiretag only and move on. Do not tag the noises in such utterances.

Unintelligible speech, fillers and interjections ARE considered speech.

All other noises - human and non-human i.e. lipsmack, laugh, breath, cough, click, ring, dtmf and

short_noise and long_noise, are NOT considered speech.

3.5 Pause

Whenever there is a pause in speech for a period of , insert this tag.1 second or more

Page 15: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 15 19

Example - speaker takes a two second pause between "yi" and "yanzu" in the sentence below.

You would insert the pause tag.

yaya a ka yi yanzu ya back wannan magana sa dai nan dai ba bai sake yin irin

Use the tag for pauses of 1 second or more (between words) and also for silence of 1 within speech

second or more or .before the person commences speaking after they finish

If noises such as lipsmack, laugh, breath, cough, click, ring, dtmf and short_noise occur in the

foreground pauses of 1 second or more within speech, do not tag these noises - simply put during

only a pause tag.

If there is no speech at all within an utterance, use the 'no speech' tag (see above)No Speech

3.6 Speaker noises

All noises made by the main speaker must be marked with one of the tags below.

Insert the tag exactly where the noise first occurs.

If it occurs at the same time as a word, put the tag BEFORE the word.

.If the noise occurs more than once in sequence, you only need a single tag

Tag When to use it

lip smacks

tongue clicks

loud inhalation and exhalation between words

yawning

coughing

throat clearing

sneezing

laughing

chuckling

Page 16: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 16 19

3.7 Other noises

Insert the relevant tag when you hear a noise that is not made by the speaker and which is at a comparable

volume to the speech.

Insert the tag exactly where the noise first occurs.

If it occurs at the same time as a word, put the tag BEFORE the word.

If the noise occurs more than once in sequence, you only need a single tag.

Tag When to use it

Any interference from the phone line (e.g. crackling sounds) or click

The sound of a phone ringing.

The sound made by pressing the telephone keypad (DTMF stands for Dual Tone

Multi-Frequency).

Any other short noises that continue over several words (generally lasting less do not

than one second), for example: door slams, a loud cough by a person in the

background, car horns.

Any other long noises that and perhaps continue over longer periods of time

multiple words (generally lasting more than one second), for example: wind, rain,

background speech or music. This tag is used . The point at when the noise begins

which the long noise ends is marked. Low level background sounds are expected not

and do not need to be tagged.

3.8 Truncations

If a word gets cut off at the end of an utterance because the computer has not cut up the audio correctly,

this is called a truncation. This is different from a fragment (where the person stops talking part way through

a word). In a truncation, the recording has cut someone off while they were saying a word. Therefore,

truncations .only occur at the start or end of an utterance

When you hear a truncation at the , write out the truncated word followed by the end of an utterance in full

tag.

Then, when you hear the remainder of the word in the second utterance, insert the tag

ONLY

Page 17: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 17 19

Example - the sentence below is split across two separate utterances, and the word 'kawai' got

cut off at the end of the first utterance and at the start of the second

kar ka damu kawai

mu je dai a hakan

so in the example above, we add the truncation tag after "kawai", although the audio had just "kaw".

Then in the other audio which has the remaining part, you add the truncation tag.

If you can tell that a word was truncated but you don't know what the word is, simply insert the

tag in place of the word and the tag at the end of the first utterance,

and the tag at the start of the second utterance.

Example - the sentence below is split across two separate utterances, and the word 'kawai' got

but you couldn't understand what the truncated word cut off at the end of the first utterance,

was:

kar ka damu

mu je dai a hakan

Page 18: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 18 19

4 Less common tags

You are not likely to need the tags below because you are listening to half of a telephone

conversation. However, you need to learn how to use them in case they are needed.

4.1 Overlapping speech

When two people are talking at the same time and at a similar volume this is called overlapping speech. If

you hear this, do NOT transcribe the speech that is overlapped, and instead insert the tag

in place of those words.

Someone talking in the background (quieter than the main speaker) is not an overlap and should

be treated as noise.

Noises overlapping with speech do not constitute an overlap. Only mark overlaps when two people

are speaking into the phone at once.

4.2 Speaker change (male < - > female)

Insert the relevant tag if a new person starts talking part way through a call, and they are of a different

gender.

Example - a female speaker has been talking then a male speaker starts talking

Example - a male speaker has been talking then a female speaker starts talking

If the speaker changes after a section of overlapping speech, this tag should be inserted AFTER

the overlap tag if the gender of the speaker has now changed.

4.3 Prompt

Use this tag if you hear any speech coming from a computer or a background recording, rather than from a

real person. For example:

computer generated voice

pre-recorded voicemail message

Page 19: 11742 Neon Hausa Transcription Guidelines (HAU_ASR002)-V10-20150904_0651

Projects

Page of 19 19

a call centre prompt

You should NOT transcribe the words. Insert the tag in place of the words.

Example - you can hear a computer generated voice suggesting the speaker to press '1' to

speak with an operator.

YOU TRANSCRIBE:

4.4 Untranscribable

If an entire utterance contains persistent or overwhelming distortion, static or background noise which

makes it impossible to be transcribed, insert the tag and move on to the next utterance. In

reality this tag is not likely to be necessary.