Download - MUSICAL EVIDENCE FOR SYLLABIFICATION OF HIGHLY MORAIC ... · The diphthong contains two moras (one for each vowel) and the liquid contains a third. They further argue that three-mora

MUSICAL EVIDENCE FOR SYLLABIFICATION OF

HIGHLY MORAIC STRUCTURES IN ENGLISH

by

Jenica Jessen

A Senior Honors Thesis Submitted to the Faculty of The University of Utah

In Partial Fulfillment of the Requirements for the

Honors Degree in Bachelor of Arts

In

Linguistics

Approved:

______________________________ Abby Kaplan Thesis Faculty Supervisor

_____________________________ Patricia Hanna Acting Chair, Department of Linguistics

_______________________________ Aaron Kaplan Honors Faculty Advisor

_____________________________ Sylvia D. Torti Dean, Honors College

April 2017 Copyright © 2017

All Rights Reserved

ii

ABSTRACT

This study uses musical data as evidence for syllabication patterns for native

English speakers. Our research seeks evidence from musical pitches in songs by

American singer-songwriters that syllables with a diphthong and a liquid in their rime

undergo bi-syllabification at a rate contrastive to other syllables. The study concludes that

variations exist between individuals, some of whom have a contrast between extremely

heavy syllables and others of whom do not. Furthermore, the study addresses the

influence of part of speech on vowel production and thus syllabification, concluding that

certain diphthongs are reduced to monophthongs within function words but not content

words.

iii

TABLE OF CONTENTS

ABSTRACT ii

INTRODUCTION 1

LITERATURE REVIEW 2

METHODS 4

ANALYSIS 10

CONCLUSIONS 19

REFERENCES 20

INTRODUCTION

In most cases, both linguists and speakers tend to demonstrate strong instincts

about how words should be syllabified—but not always. Words with a diphthong

followed by a liquid present a unique challenge. While many speakers judge words such

as “fire” or “smile” as having one syllable, many others judge them to have two, and even

more speakers have trouble making judgements in the first place. Additionally, not all

speakers produce the same number of syllables as they might judge an utterance to

contain. Further complicating the matter is the difficulty of determining articulatory

properties of syllables—there is strong evidence that they are a cognitive unit of

organization, but little indication that they can be identified in physical phenomena.

One exception, however, is in the production of music. Generally speaking, song

closely mimics the rhythms of speech, meaning that syllable count may be reflected in

musical patterns. Thus, in order to determine how many syllables exist in a word like

“fire”, our study investigates how many pitches are used to sing it when it occurs in

music created by American singer-songwriters. We present evidence that this

methodology can be used to accurately determine syllable counts.

Our analysis indicates that there is a strong variation from individual to

individual; while some people consistently produce “fire” with one syllable, others

consistently produce it with two. We also investigate the influence of part of speech on

the production of these words, concluding that in some cases, vowel pronunciation and

thus syllabification may be affected by a lexical item’s status as a function word (such as

a conjunction or pronoun) or a content word (such as a noun or verb).

1

LITERATURE REVIEW

English syllables consisting of diphthongs followed by liquids have been analyzed

in a variety of different ways. Lavoie and Cohn (1999) interpret these structures as

“sesquisyllables”—structures containing three moras. The diphthong contains two moras

(one for each vowel) and the liquid contains a third. They further argue that three-mora

syllables are dispreferred in English, making this configuration is inherently unstable.

Thus these words are represented as either two separate syllables or one “superheavy”

syllable, depending on the situation.

Further research from Cohn and Tilsen (2015) indicates that speakers who tend to

produce these words with two syllables also perceive them as having two syllables, while

those who produce them with only one perceive them with only one. This suggests that

the variations in pronunciation are structurally conditioned; that is, they are due to the

underlying representation of the word. For some people, the underlying representation of

a word like “fire” is one syllable, while for other people it is two.

Our research attempted to gather more information on this phenomenon, through

analysis of song lyrics, text setting, and musical pitches. Previous study done by Palmer

and Kelly (1992) indicates that stress and prosodic structure of spoken words are tightly

correlated with rhythm and meter when the text is set to music, a conclusion supported by

the work of Dell and Halle (2005), Rodriguez-Vasqeuz (2010), and Sui (2013). Since

syllables are the basic unit that carry stress in speech and notes are the basic unit that

2

carry meter in music, a general assumption for text setting in many languages is that each

note carries one syllable.

Since by default one note equals one syllable, we might expect that a word like

“fire” will be sung across two notes if the speaker represents it as two notes, and it will be

sung with only one note if the speaker represents it with a single syllable. However,

actual text setting is not quite so simple. As described in Schellenberg 2016, English does

not always obey the “one note per syllable” principle. While many languages (such as

German or Russian) strongly prefer to limit each syllable to a single note, English allows

melisma, or the spreading of a single syllable across multiple notes. Thus a word like

“all” might be drawn out across several pitches, even if it is underlyingly represented by

only a single syllable.

Another important phenomenon to take into account is that the natural rhythm of

speech may be distorted for artistic purposes when set to music. Schellenberg 2012 uses

evidence from Chinese to argue that language does not always determine musical

behavior. If a songwriter wishes to violate a property of the chosen words—for example,

by mapping a word containing a high tone onto a note with a low pitch—they are free to

do so. This suggests that when the text and the music come in conflict, the music will

often win, further complicating the problem of using musical works to study linguistic

properties.

Although these issues make it difficult to draw simple conclusions, they can be

mitigated, allowing for evidence to be extracted from musical data. One important

strategy is to gather large amounts of data in order to reduce the potential influence of

artistic license—although a singer with a one-syllable underlying representation of a

3

certain word may choose to draw that word out across multiple notes, it is highly unlikely

that such behavior will be repeated across every song created by the singer. Additionally,

statistical analysis must take into account the presence of repetitive words; a line that is

repeated over and over in the chorus may skew the data if the words it contains are

counted as an independent token each time. Finally, obvious distortions for artistic effect

should be ignored. For example, if the word “all” is spread out across ten different

pitches, this is almost certainly evidence for creative license rather than a ten-syllable

underlying representation.

METHODS

Data for this project was gathered from the works of 12 American singer-

songwriters, defined as people who compose the music for, write the lyrics of, and

perform their own songs. This group was chosen in order to ensure that the target words

would be the product of one person’s intuitions; we wanted to eliminate the possibility of

another person’s underlying representation influencing the outcome.

4

The chosen artists—listed in Figure 1—had a variety of backgrounds, with

birthdates ranging from 1941 to 1980, birthplaces located in ten different states, and work

produced in several different genres of music. However, the chosen artists were not

highly diverse, since the set of eligible artists with sufficiently sized bodies of work was

somewhat limited; all but two were male and all but one were white.

Figure 1: Artists Researched

Artist Name Birth Year Location Gender Race Main Genres

Bob Dylan 1941 Minnesota Male White Folk, Blues, Country

James Taylor 1948 North Carolina

Male White Folk Rock, Country

Bruce Springsteen

1949 New Jersey Male White Rock

Billy Joel 1949 New York Male White Soft Rock, Pop

Stevie Wonder 1950 Michigan Male African-American

Soul, Jazz, Funk, R&B

Suzanne Vega 1959 California Female White Alternative Rock, Folk

Ben Folds 1966 North Carolina

Male White Alternative Rock, Pop

Beck Hansen 1970 California Male White Alternative Rock

John Mayer 1977 Connecticut Male White Pop, Rock, Blues, Folk

Ryan Tedder 1979 Oklahoma Male White Alternative Rock, Pop

Ingrid Michaelson

1979 New York Female White Folk, Indie

Conor Oberst 1980 Nebraska Male White Folk, Indie, Pop

5

The data gathered consisted of words that occurred within the songs of these

artists. Although the total syllable count of the chosen word was not taken into account,

each target word had primary stress on the final syllable, with one of the following rimes:

• [aiɹ] (as in “fire”)

• [ail] (as in “file”)

• [ain] (as in “fine”)

• [aim] (as in “time”)

• [aɹ] (as in “far”)

• [iɹ] (as in “fear”)

• [al] (as in “fall”)

• [il] (as in “feel”)

The rimes [aiɹ] and [ail] were the primary targets of study, since they contained the

diphthong [ai] followed by a liquid. [ain] and [aim] were included as a control group

since nasals are also highly sonorant consonants, and [aɹ], [iɹ], [al], [il] were also

included as a control group because they each contain a component vowel of the target

diphthong and one of the target liquids. Despite the presence of sonorous consonants and

constituent vowels, however, we expect all words in the control group to be pronounced

with one syllable. (Although the researchers were all in agreement on this point, Lavoie

and Cohn (1999) argued that words such as “feel”, “fear”, and “fine” were also trimoraic

and thus should be as difficult to syllabify as “fire”. Our intuitions strongly contradict this

analysis. Additionally, we will see that the results suggest [aiɹ] may have multiple

syllables for some speakers, while [il], [iɹ], and [ain] do not demonstrate this effect.)

The lyrics of each song were searched for target words, and then the four

members of the research group listened to each word within the song in order to

determine how many pitches were sung by the artist. Each researcher coded the entire

works of three artists, as well as a randomly selected subset of the other nine artists. The

result was that each token was coded by two researchers (the one who specialized in the

6

artist and a randomly selected second researcher) in order to ensure accuracy. If their

judgements didn’t match, the entire research group listened to and discussed the token in

order to reach a consensus. Of our 6,498 total tokens, 1,373 tokens (or 21%) required

discussion.

Polymorphemic words which ended in the target rime (such as “higher” or “I’ll”)

or any of the control rimes (such as “we’re” or “we’ll”) were analyzed independently and

placed in a separate dataset, in order to control for the potential influence of the

morpheme boundary. In order to investigate vowel distortion and part of speech, tokens

of the word “while” were also coded for pronunciation and lexical category. Additionally,

tokens of “I’ll” from the polymorphemic dataset also were coded for pronunciation. Due

to time constraints, the tokens in this dataset were only coded by a single researcher (the

author).

When the data was analyzed, all tokens with more than four pitches were top-

coded—that is, coded as having four pitches. This was done for a number of reasons.

First of all, we assumed that tokens with many pitches were obvious products of artistic

license rather than bizarre underlying representations. Since these tokens would be

reflecting artistic choices rather than linguistic intuitions, we reasoned that there was

ultimately no meaningful difference between 6 pitches and 7, 7 pitches and 8, or even

higher numbers. We also realized that it was difficult for the research team to judge these

tokens accurately; reaching a perfect consensus on the number of notes contained in a

highly melismatic token could be quite challenging. Out of our dataset, 108 tokens were

coded as having four or more pitches, or 1.7%.

7

In order to handle repetitions, we considered a variety of options. One possibility

was to include all repetitions of a certain token, treating each instance as an independent

production with equal validity to all other productions. This approach, however, would

severely skew the dataset if each token was produced with an unusual effect (such as

shortening or melisma); an example of the problem is Bruce Springsteen’s song “Streets

of Fire”, where the word “fire” is repeated eleven times in the chorus with approximately

three to four pitches for each token. Another option was to keep only the first instance of

a token, treating it as the original and assuming all others were merely imitations.

However, we noticed that in some cases a pitch change would appear or disappear for a

certain repetition over the course of a song, making it difficult to treat each token as a

perfect copy of the previous one. (In the “Streets of Fire” example, one instance of the

chorus had “fire” produced with only two pitches, while another had it produced with

ten.) Another option was to take the average number of pitches for all appearances of a

token within a song, which could possibly correct for both issues; in some cases,

however, that would lead to a token being represented by a fraction of a pitch, and could

lead to a single strangely produced token skewing the entire set.

Our eventual choice was to take a random token from each set of repetitions and

to discard the others. While this did not take into account the possibility for a token’s

production to shift over the course of a song, it did prevent the possibility of skewed data

due to averaging. Additionally, an analysis with randomly selected tokens did not greatly

differ from a preliminary analysis with averages across a set of repetitions, leading us to

conclude that either approach would be acceptable. Thus we chose a random token from

each set of repetitions and eliminated the others from the analysis.

8

ANALYSIS

Syllabification of [aiL]

Our analysis consistently supported the hypothesis that some people clearly

pronounce highly moraic words such as “fire” with one syllable, while others clearly

pronounce them with two. The graphs in Figure 1 outline the number of syllables for each

individual singer, for target syllables [aiɹ] and [ail], vowel-liquid combinations (VL), and

[ai] combined with nasals. Impressionistically, [aiɹ] seems to be behaving differently than

the other rimes. Some singers show a clear distinction; the average number of pitches

used to sing [aiɹ] by Stevie Wonder is approximately two, while words in the control

group use slightly more than one. (Many artists have averages slightly above one because

of melisma.) John Mayer sings an average of three pitches per [aiɹ] rime and slightly over

one pitch for control words, suggesting that his underlying representation has more

syllables for [aiɹ] than for related words. However, some singers (such as James Taylor)

use only a single pitch for every word studied. Others (such as Bob Dylan and Ben Folds)

have less clear patterns.

Additionally, these graphs do not take into account the potential for melisma or

higher than expected numbers for the control group—Ryan Tedder seems to sing every

word with an average of one and a half pitches. They also do not address reliability—for

example, Stevie Wonder has 53 tokens of [aiɹ] while Billy Joel only has 15, suggesting

that we can be far more confident drawing conclusion from this graph for Stevie Wonder

than we can for Billy Joel.

9

Figure 1: Average Number of Pitches Per Rime

10

Further analysis corrects for these influences by fitting a separate linear regression

model for each artist, which predicts the number of pitches in each token from fixed

effects of rime type and year of composition, as well as a random effect to control for

stylistic variation among songs. Figure 2 shows the relative baseline for each artist, set by

the number of syllables used for the [VL] and [aiN] control words. For instance, Billy

Joel (WMJ) consistently sings one syllable more for the target [aiɹ] words then he does

for the control words, while James Taylor (JVT) sings exactly the same number of

syllables for target words as he does for control words. Impressionistically, it appears that

seven artists have a clear distinction between [aiɹ] words and control words (arranged to

the left of the dotted line). Five artists did not appear to have a strong distinction

(arranged to the right of the dotted line). Interestingly, only one artist, Beck Hansen,

showed a similar distinction for [ail] words, suggesting that the two liquids are treated

differently. (This might be due to [ɹ] being more sonorous than [l].)

Figure 2: Baseline Analysis

*Ingrid Michaelson did not have a statistically significant number of [ail] tokens for this analysis.

11

The next graph calculates the odds ratio, with a logistic regression model for each

artist, which predicts whether a rime will be sung with a single pitch or multiple pitches.

This is because it is not fully clear whether it matters that an artist chooses to use four

pitches rather than three, or three pitches rather than two, when they decide to assign

more than one pitch to a syllable. Despite the use of a binary variable, this graph bears

several similarities to the previous one, strengthening the evidence for the analysis that

some singers have a clear distinction between [aiɹ] and other rimes.

Figure 3: Odds Ratio Correction

This analysis strongly suggests that seven artists (Ben Folds, Billy Joel, John

Mayer, Beck Hansen, Ryan Tedder, Suzanne Vega, and Ingrid Michaelson) have an

underlying representation of 2 syllables for [aiɹ] rimes, while the other five (Conor

Oberst, Stevie Wonder, Bruce Springsteen, James Taylor, and Bob Dylan) have an

underlying representation of 1 syllable.

12

Although the study only had twelve subjects, making it difficult to draw broad

conclusions about population-wide patterns, there are some impressionistic trends that

might be fruitful for future research. For example, age might make a difference in

whether these words are realized with one syllable or two; as visible in Figure 4, all seven

artists with a contrast between [aiɹ] and other words were born in 1949 or later, and four

of the five artists without a contrast for [aiɹ] were born 1950 or earlier. It is possible that

language changes over time have caused a more widespread realization of “fire” as two

syllables. Additionally, there may be a dialectal component—the three artists from the

Midwest (Bob Dylan, Stevie Wonder, and Conor Oberst, born in Minnesota, Michigan,

and Nebraska respectively) did not have a contrast for [aiɹ], while both artists from

California (Suzanne Vega and Beck Hansen) and three artists from the Northeast (Ingrid

Michaelson, John Mayer, and Billy Joel, from New York, Connecticut, and New York)

did contrast [aiɹ] with the control group. (A possible counterexample, however, is Bruce

Springsteen. Born in New Jersey in 1949, he does not display a contrast.)

13

Figure 4: Artists, Birth Years, Birthplace, and [aiɹ] Contrast

Artist Name Birth Year Location [aiɹ] Contrast Bob Dylan 1941 Duluth, MN None

James Taylor 1948 Chapel Hill, NC None Bruce Springsteen 1949 Long Branch, NJ None

Billy Joel 1949 Hicksville, NY [aiɹ] contrast Stevie Wonder 1950 Saginaw, MI None Suzanne Vega 1959 Santa Monica, CA [aiɹ] contrast

Ben Folds 1966 Winston-Salem, NC [aiɹ] contrast Beck Hansen 1970 Los Angeles, CA [aiɹ] contrast John Mayer 1977 Bridgeport, CT [aiɹ] contrast Ryan Tedder 1979 Tulsa, OK [aiɹ] contrast

Ingrid Michaelson 1979 New York City, NY [aiɹ] contrast Conor Oberst 1980 Omaha, NE None

“While”, Contractions, and Parts of Speech

Further investigation was conducted into the target word “while”. Researchers’

intuitions suggested that “while” varied in pronunciation based on part of speech; for

example, it might be realized as [wail] in a sentence such as “I haven’t seen you in a

while” but as [wæl] in a sentence such as “I saw you while I was at the store”. It was

hypothesized that the diphthong [ai] was used for uses of “while” as a noun, but this was

relaxed to [æ] for uses of “while” as a conjunction.

Each token of “while” in the dataset was coded for pronunciation and part of

speech, on top of the data previously gathered. Figure 5 indicates the number of tokens

retrieved for each pronunciation and part of speech. (A small handful of tokens that used

other pronunciations, such as [wɑl] or [wɛl], were excluded.) This table shows that

14

instances of “while” as a noun were overwhelmingly pronounced as “[wail]” (87% of the

time), while instances of “while” as a conjunction were overwhelmingly pronounced as

“[wæl]” (97% of the time).

Figure 5: Tokens of “While” by Pronunciation and Part of Speech

Noun Conjunction [wail] 57 8 [wæl] 2 66

Furthermore, the data supported the hypothesis that noun version of “while” was

produced with the same number of pitches that a speaker used to produce target words

like “file”, and the conjunction version of “while” was produced with the same number of

pitches that a speaker used to produce control words like “fall”. Figure 6 shows a selected

number of artists with their productions for all [ail] rimes, [ail] pronunciations of “while”,

all [Vl] rimes, and [æl] pronunciations of “while”. (Not all artists are represented here,

since not all of them produced significant numbers of “while” tokens; Ryan Tedder had

none at all. This table includes all artists who produced ten or more tokens of “while”.)

15

Figure 6: Average Number of Syllables for Various Rime Types

16

In order to further investigate the phenomenon, a few artists were also analyzed

for their use of polymorphemic words with the target codas. The vast majority of these

were contractions such as “I’ll”, “I’m”, or “we’ll”, with a handful of suffixed words such

as “liar” or “higher”. The procedures followed to code these were the same as were

followed for the rest of the dataset, although due to time constraints each token was only

coded a single time by one researcher. It is thus somewhat more difficult to draw firm

conclusions from these than it is for the rest of the dataset.

Figure 7: Contractions and Polymorphemic Words Artist Average number of pitches Percentage of targets sung

with more than one pitch Ben Folds (BSF) 1.03 1.8% Bruce Springsteen (BJS) 1.16 19.4% Ryan Tedder (RTD) 1.05 5.2% Suzanne Vega (SNV) 1.02 2.4% Billy Joel (WMJ) 1.08 9.2%

Interestingly, almost all of these tokens were sung with only a single pitch,

despite the morpheme boundary. This was even true for tokens of “I’ll” in cases where

the artist has a two-syllable representation for [aiɹ] words (as did Ben Folds, Billy Joel,

Ryan Tedder, and Suzanne Vega). Analysis of the pronunciation of “I’ll” revealed that

the vast majority of the time, it was pronounced as [æl] rather than as [ail], as shown in

Figure 8.

17

Figure 8: Number of “I’ll” Tokens by Pronunciation Artist [æl] [ail] WMJ 12 5 BJS 27 1 RTD 12 2 BF 14 3 SNV 36 1

This provides further support for the hypothesis that speakers are likely to reduce

complex rimes like [ail] when they occur in function words, such as conjunctions and

pronouns, but not when they occur in content words such as nouns and verbs. The fact

that English has almost no instances of function words that end in the rime [aiɹ] (with the

possible and highly arguable exception of “why’re”) might also suggest that this pattern

is dispreferred, particularly if it’s true that [ɹ] behaves differently (and is more sonorous)

than [l]. It’s possible that [aiɹ] function words are unacceptable, and [ail] is pushing the

boundaries of acceptability, leading to it being reduced by the speaker into a

monophthong represented by only a single syllable.

18

CONCLUSIONS

Although speaker judgements for words like “fire” and “smile” can be unreliable,

music can be used to investigate individual intuitions. While this type of analysis is

difficult to use for the investigation of broad patterns, since it can only be used with a

small set of subjects, it can provide strong evidence for the syllable judgements of certain

individuals.

We conclude that some speakers have a clear distinction between [aiɹ] rimes and

other syllables, while some speakers do not. For example, we determined that artists like

Billy Joel and John Mayer exhibit such a contrast, while artists like James Taylor and

Bob Dylan were clearly lacking one.

An open question is what factors are related to this contrast. While our data

suggests that birth year might be correlated with this contrast, firm conclusions are

impossible from such a small dataset. Other possible factors include dialectal or regional

differences. Further research is necessary to determine what factors might predict

whether a certain person has this contrast or not.

We also have determined that part of speech is strongly correlated with the

pronunciation of certain rimes; diphthongs in complex rimes are often reduced to

monophthongs when they occur in function words, but are preserved when they occur in

content words. This behavior appears to be widespread across speakers. Further research

in this area might include an investigation of other diphthongs and codas to determine

whether (and in what circumstances) they are reduced.

19

REFERENCES

Cohn, A. C. (2003). Phonological Structure and Phonetic Duration: The Role of the

Mora. Working Papers of the Cornell Phonetics Laboratory, v. 15, 69-100.

Cohn, A. C., & Tilsen, S. (2015). Relation between syllable count judgments and

durations of English liquid rimes. Cornell University.

Dell, F., & Halle, J. (2005). Comparing Musical Textsetting in French and in English

Songs. Typology of Poetic Forms. Paris.

Lavoie, L. M., & Cohn, A. C. (1999). Sesquisyllables of English: The Structure of

Vowel-Liquid Syllables. International Congress of Phonetic Sciences, (p.

University of California). San Francisco.

Palmer, C., & Kelly, M. H. (1992). Linguistic Prosody and Musical Meter in Song.

Journal of Memory and Language 31, 525-542.

Rodriguez-Vasquez, R. (2010). Text-setting Constraints: A Comparative Perspective.

Australian Journal of Linguistics Vol. 30, No. 1, 19-34.

Schellenberg, M. (2012). Does Language Determine Music in Tone Languages? .

Ethnomusicology, Vol. 56, No. 2.

Schellenberg, M. (2016). Influence of Syllable Structure on Musical Text Setting. Poster

session presented at the 15th Conference of Laboratory Phonology, Cornell

University

Sui, Y. (2013). Phonological And Phonetic Evidence For Trochaic Metrical Structure In

Standard Chinese (Dissertation). University of Pennsylvania.

20

Name of Candidate: Jenica Jessen

Birth date: December 5th, 1994

Birth place: Burley, Idaho

Address: 2108 W. Marblewood Dr. Riverton, UT, 84065

Download - MUSICAL EVIDENCE FOR SYLLABIFICATION OF HIGHLY MORAIC ... · The diphthong contains two moras (one for each vowel) and the liquid contains a third. They further argue that three-mora

Top Related