26 synthesis i

Upload: ravi-kumar-sinha

Post on 10-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 26 Synthesis I

    1/25

    111-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Speech Synthesis I

    11-682: Introduction to HumanLanguage Technologies

  • 8/8/2019 26 Synthesis I

    2/25

    211-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Speech Synthesis Processes

    Text Analysis From strings of characters to words

    Linguistic Analysis

    From words to pronunciations and prosody

    Waveform Synthesis

    from pronunciations to waveforms

  • 8/8/2019 26 Synthesis I

    3/25

    311-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Text Analysis

    This is a pen.

    My cat who lives dangerously has nine lives.

    He stole $100 from the bank.

    He stole 1996 cattle on 25 Nov 1996.

    He stole $100 million from the bank.

    It's 13 St. Andrew St. near the bank.

    Its a PIII 650Mhz, 128MB RAM, 13.6Gb SCSI,24x cdrom and 19" monitor.

    My home pgae is http://www.geocities.com/awb/.

  • 8/8/2019 26 Synthesis I

    4/25

    411-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Text Analysis

    bunch of hacky rules

    But here, based on NSW Project from JHU Workshop '99(NSW = Normalization of Non-Standard Words)

    Splitter:

    domain independent tokenizer

    token type classifier:

    number (ordinal/cardinal/year) etc

    homograph disambiguator

    abbrev/letter sequence identifier

    token type expander: mostly deterministic

    but abbreviation expander interesting

    language model:

    (optionally) choose best expansion

  • 8/8/2019 26 Synthesis I

    5/25

    511-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    NSW model for new domains

    Models for specific domains Standard text analyzers fail

    Can build models from labeled data

    57 ST E/1st & 2nd Ave Huge

    drmn 1 BR 750+ sf, lots of sunclsts. Sundeck & lndry facils. Askg$187K, maint $868, utilsincld. Call Bkr Peter 914-428-9054.

  • 8/8/2019 26 Synthesis I

    6/25

    611-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Synthesis Processes

    Text Analysis From strings of characters to words

    Linguistic Analysis

    From words to pronunciations and prosody

    Waveform Synthesis

    from pronunciations to waveforms

  • 8/8/2019 26 Synthesis I

    7/25711-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Lexicons and letter to soundrules

    Need big list of words: often hardest requirement

    Sharing between dialects:

    CSTR English dialect lexicon

    (Center for Speech Technology Research, Edinburgh) But the lexicon is nevercomplete:

    need out of vocabulary pronouncer

  • 8/8/2019 26 Synthesis I

    8/25

  • 8/8/2019 26 Synthesis I

    9/25911-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Letter to sound rules: method

    Find alignments:

    Using epsilon scattering and EM algorithm

    or hand specify possible letter-phone combinations

    Build CART models:

    predict phone (or epsilon)

    letter context (plus POS ...)

    c h e c k e d

    ch - eh - k - t

  • 8/8/2019 26 Synthesis I

    10/251011-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    An example tree

    For letter V (ie: revving vs. Romanov):

    if (n.name is v)

    return _

    if (n.name is #)

    if (p.p.name is n)

    return f

    return v

    if (n.name is s)if (p.p.p.name) is n

    return f

    return v

    return v

  • 8/8/2019 26 Synthesis I

    11/251111-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Prosody

    Phrasing

    chunking in speech

    Intonation

    Accents and F0 generation

    Duration:

    length of segments

    Power:

    loudness

    Each has underlying default, and marked forms

  • 8/8/2019 26 Synthesis I

    12/251211-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Intonation marking

  • 8/8/2019 26 Synthesis I

    13/251311-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Intonation Marking

    Large pitch range (female)

    Authoritive since goes down at the end

    News reader

    Emphasis for Finance (H*)

    Final has a raise more information to come

    Female American newsreader from WBUR

    (Boston University Radio)

  • 8/8/2019 26 Synthesis I

    14/25

    1411-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Intonation Examples

    Fixed durations, flat F0. Decline F0

    hat accents on stressed syllables

    accents and end tones

    statistically trained

  • 8/8/2019 26 Synthesis I

    15/25

    1511-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Synthesis Processes

    Text Analysis

    From strings of characters to words

    Linguistic Analysis

    From words to pronunciations and prosody

    Waveform Synthesis

    from pronunciations to waveforms

  • 8/8/2019 26 Synthesis I

    16/25

    1611-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Waveform Synthesis

    Concatenative synthesis: Random word/phrase concatenation

    Phone concatenation

    Diphone concatenation

    Sub-word unit selection Cluster based unit selection

  • 8/8/2019 26 Synthesis I

    17/25

    1711-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Waveform Synthesis diphone

    All phone-phone transitions in language: one occurrence of each

    assume two-phone context is sufficient

    may also include common consonant clusters

    Mid-phone is more stable so easier to join Inventory size: phones2

    A safe way to build a synthesizer

  • 8/8/2019 26 Synthesis I

    18/25

    1811-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Diphone example: (US English)

    Diphone schema 1348 words (1618 diphones):

    includes consonant clusters and open vowels

    Prompts are:

    kal_0001 ("b-aa" "aa-b") (t aa b aa b aa)

    kal_0002 ("p-aa" "aa-p") (t aa p aa p aa)

    Automatic aligning alone:

    around 15 words wrong better than human labeler (first pass)

    takes 30 minutes (vs 2 weeks)

    KAL voice created in 2 days

  • 8/8/2019 26 Synthesis I

    19/25

    1911-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Unit selection synthesis

    Large representative database

    Multiple examples of each unit type

    Select appropriate units:

    phonetically and prosodically

    Various unit selection algorithms:

    target, concatenation cost

    Cluster phonological structure matching

    famed for being excellent

    famed for being incomprehensible

  • 8/8/2019 26 Synthesis I

    20/25

    2011-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Synthesis in limited domains

    Unit selection can be good or very very bad Generally a problem due to open-endedness

    Unseen contexts will be synthesized poorly

    Restricting the domain provides set of known contexts

    Every context to be produced can be in the recordings Results in better-sounding synthesis

    for the things it can say

    Examples

    Weather reports Bus information (place names)

  • 8/8/2019 26 Synthesis I

    21/25

    2111-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Making synthesis better

    Basic reading clean text in neutral form works

    few applications require that though

    We need synthesis to be:

    more flexible:

    not just text to speech

    more natural

    so its doesn't sound like a synthesizer

    more efficient

    easy to build new voices

    synthesizes quickly on small machines

  • 8/8/2019 26 Synthesis I

    22/25

    2211-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    The boy saw the girl in the park with the telescope.

    The boy saw the girl in the park with the telescope.

    Some English first and then some Spanish.

    Hola amigos.

    Namaste

    Good morning My name is Stuart, which is spelled

    stuart

    though some people pronounce it

    stuart. My telephone number is2787.

    I used to work in Buccleuch Place, but no one can pronounce that.

    By the way, my telephone number is actually

  • 8/8/2019 26 Synthesis I

    23/25

    2311-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    SABLE: for marking emphasis

    What will the weather be like today in Boston?It will be rainy today in Boston.

    When will it rain in Boston?

    It will be rainy today in Boston.

    Where will it rain today?It will be rainy today in Boston

  • 8/8/2019 26 Synthesis I

    24/25

    2411-682: Intro to IR, NLP,MT,Speech Speech Synthesis I

    Making it efficient

    Adding new voices: Need new voices and languages

    tools to record, label, segment, etc.

    Synthesis must be fast

    Dialog systems must respond in time recognition, dialog plus synthesis under 1 sec.

    And run on a handheld device or mobile phone

  • 8/8/2019 26 Synthesis I

    25/25

    2511 682: Intro to IR NLP MT Speech Speech Synthesis I

    Summary

    Synthesis can broken down into three parts: Text analysis: expanding homographs, symbols, numbers etc

    Linguistic analysis: pronunciation, letter to sound rules

    prosody: phrasing, intonation and duration

    Waveform synthesis: diphone voices:

    unit selection voices

    limited domain

    Not just text-to-speech: markup