Download - The grammatical variation in French: the difference ... · The grammatical variation in French: the ... whether we should use ... In French, there are more tenses than in English

The grammatical variation in French: the difference between the use of “à” and “sur” Name Namestudent number

EL25LINC

Sociolinguistics L3

LLCE English

2016/2017

Supervisor: Dylan Glynn

1

Table of contents

1. Introduction ........................................................................................................ 2

1.1 Temporality related to the age ........................................................................... 3

1.2 Formality .............................................................................................................. 3

1.3 Demography related to the introduction of a big city ........................................ 3

2. Data and method ................................................................................................ 4

2.1 Collecting the data ............................................................................................... 4

2.2 Method of analysis .............................................................................................. 6

3. Results and interpretation ................................................................................... 8

3.1 Results of the analysis of the different hypotheses ............................................. 8

3.2 Results of Temporality related to the age hypothesis’ test ............................... 11

3.3 Results of the use of sur and à in formal language hypothesis’ test ................. 14

3.4 Results of the demography and the use of sur to introduce big cities relation

hypothesis’ test ....................................................................................................... 16

4. Discussion ..........................................................................................................18

4.1 Temporality related to the age .......................................................................... 18

4.2 Formality ............................................................................................................ 19

4.3 Demography compared to the introduction of big cities ................................... 20

4.4 Conclusion ......................................................................................................... 20

The grammatical variation in French: the difference between the use of “à” and “sur” Name Name student number

2

1. Introduction

The difference between the two French prepositions à and sur is not clearly

defined in everyone’s minds. For native speakers, the use of the preposition à

or sur is instinctive. They know how to make the difference between them

when they express themselves thanks to grammaticality. In linguistics,

grammaticality is what is right or wrong in a sentence according to the native

speaker’s judgment. To non-native French speakers, when it comes to choose

between the two, it becomes difficult to be as instinctive as native speakers.

For a long time, French native speakers were using the preposition à in order

to indicate a location (“j’habite à Paris”; I live in Paris). Indeed, if people

search in a French grammar book, they will find five prepositions that

introduce a location in a sentence à, vers, en, dans and chez. These

prepositions can be translated by to, in, at, etc. Over the years, native speakers

slowly start to use the preposition sur instead of à. Sur is translated by “on

top” but in French, it is common to use it in order to introduce a location.

Therefore, among the native speakers, it seems not everyone agrees on

whether we should use à or sur to introduce a location: “à Paris” or “sur

Paris” for instance. A debate over which one is the more grammatical remains

among some native speakers of French. However, despite the increase of the

use of sur, we do not know why people tend to choose whether à or sur in

order to introduce a location. As native speakers, we decided to focus on this

subject and tried to understand what the differences between à and sur are.

To answer this problem, we have found several features that could

play a role in the distinction of these two prepositions. These several features

allowed us to find some semantic and social hypotheses. Here, we have tested

these features thanks to the use of a questionnaire and analyzed some which

might explain the differences between the two French prepositions. In order to

do this report, I chose several hypotheses that I will analyze.

3

1.1 Temporality related to the age

In French, there are more tenses than in English. Therefore, we have separated

the temporality feature into two sections: present and general. General

represents all the other tenses. There is a possibility that the age plays a role in

the use of sur in present tense. For this report, we will focus on which

category of age it seems to be more grammatical to use sur when introducing a

location in present tense. Here, we focus only on French native speakers.

1.2 Formality

In French, formality is not only a matter of the use of tu and vous but also a

matter of register. It refers to the formality of an expression, grammatical

structure, a word, gesture, etc. Formal French is both written and spoken. It is

a polite language used to show respect, instore a distance or when the speaker

does not know the other person. Here, we are going to see whether French

speakers think à or sur is more grammatical in formal language. And, we only

analyze French native speakers and people who have not guessed the topic of

our study.

1.3 Demography related to the introduction of a big city

This last hypothesis will be based on demography. Since not everyone comes

from the same place, it would be interesting to wonder if the demography

plays a role in the use of sur in order to introduce big cities. Here, we only

focus on French native speakers.

4

2. Data and Method

2.1 Collecting the data During our linguistics classes, we have found several social and semantic

features that could explain why this grammatical variation occurs and in which

situation people are most likely to use “à” or “sur”. We transformed these

features into questions in order to create a questionnaire which includes both

our semantic and social features. This questionnaire is composed of two

sections. The first one gathers information on the person who answers, thus

allowing us to collect social features in order to test our social hypotheses. The

second one focuses on the judgement of the person who answers. This second

section is based on 16 sentences divided in 8 instances of “à” and 8 instances

of “sur”. Here people have to answer on a scale of 1 to 9 if the sentence seems

grammatical or not according to their judgment – with 1 being completely

unnatural and 9 completely natural. To collect the data as accurately as

possible, we have used a nine-level Likert scale. A five-level Likert scale

might have been subject to distortions for this kind of questionnaire. Indeed,

for instance, people would avoid choosing an extreme answer with this one;

while they are more likely to choose 8 on a nine-level scale. Furthermore, 8 on

a nine-level scale becomes 5 on a five-level scale once we have converted the

first scale to the five-level scale (2.2).

The questionnaire is divided into two distinctive sections. The first set

represents social features. Therefore, we have these eight categories: AGE (15-25, 26-35, 36-45, 46-55, 56 and more), EDUCATION’S LEVEL (middle

school, high school, college), DEMOGRAPHY OF ORIGIN (urban, suburb,

rural), CURRENT DEMOGRAPHY (urban, suburb, rural), COUNTRY OF ORIGIN, MOTHER TONGUE and L2, L3. Among these categories, four of

them have defined answers (Age, Education, demography, and current

demography) and the four others are free-answers. The second set represents

5

the semantic features. These features are divided into four categories:

REGISTER (formal and informal), SIZE (big and small), FAMILIARITY

(familiar and non-familiar), and TEMPORALITY (present tense and other

tenses). For this set, the sentences used are the following:

(1) J’ai été retenu pour un entretien d’embauche à Paris.

Le diner d’affaires sur Paris auquel nous sommes invités est

primordial.

(2) On était sur Amsterdam pour se marrer.

Mes potes et moi on était à Lille pour se faire un ciné.

(3) Sur Paris, les gens sont plus désagréables qu’en province.

Les Jeux Olympiques d’hiver se dérouleront à Seoul en 2018.

(4) Ma grand-mère habitait sur Dame-Marie il y a 20 ans.

J’ai acheté du pain à Everly ce matin.

(5) Ma tante passe ses vacances sur Marseille cet été.

Ce soir, je dors à Bruxelles chez mon copain.

(6) Hier, sur Manchester, je me suis fait un foot.

À Flawinne, les restaurants sont réputés.

(7) Ecoute, moi je suis sur Paris cet après-midi.

Mes amis passent le week-end à Strasbourg.

(8) Depuis son enfance, il a vécu sur Bordeaux avec sa mère.

Et tu sais ? À Lyon la semaine dernière, j’ai croisé Hervé.

6

In the above examples, Register is defined by the level of formality:

formal (1) and informal (2). The Size of the cities is categorized by either big

(3) or small (4) cities regarding the size of the agglomeration. The Familiarity

regards the cities but also the knowledge of the speakers; if one is familiar (5)

or non-familiar (6) with the mentioned city. The last feature, Temporality,

indicates the tense of the sentence: present tense (7) and the other tenses (8).

To collect all the data, we have distributed the questionnaire all around

us. But it was not enough. Indeed, most of them were people of our age or our

parents, and all the results were not numerous enough in every category.

Therefore, in order to have more diversified results we have created a

numerical questionnaire thanks to Google Forms and shared it on the Internet.

Thanks to that, we have collected 870 results – including 713 native speakers

of French.

2.2 Method of the analysis

To test and analyze our hypotheses, I extracted the data that were interesting

from the excel file where all our data were combined. In this same file, I made

different sheets for each of my different hypotheses, where I made a pivot

table, in order to have my results in term of numbers. Still in the same sheets, I

made a smaller table and copied/pasted it in bloc notes for later use.

Then, I used a software called R and the R commands files provided

by our professor where you can find the commands required to do this

analysis. Afterwards, thanks to the xtab command, I have imported one of the

bloc notes file on R. In order to find the margin of error – the p-value, I have

used the CHI command, which allows us to test my hypothesis. If the p-value

is p<0.05 our results are significant but on the contrary if the p-value is p>0.05

it is not. If the p-value is significant we can thus proceed and analyze the

correlation and anti-correlation thanks to the RES command. Thanks to the

residuals command, we can analyze the correlations (positives numbers) and

7

anti-correlations (negatives numbers). And then, I did the same for the other

bloc note files.

Regarding the excel files, I decided to work only with the French

native speakers. Indeed, here, we focus on the grammaticality and therefore,

using only native speakers will be more useful since grammaticality is based

on the native speaker’s judgement. Therefore, it leaves us with 713 entries to

analyze instead of 870. Furthermore, we have included a granularity system.

Indeed, in the questionnaire the scale was a nine-level Likert scale and now in

the excel files it becomes both a five-level and three-level Likert scales.

Therefore, some features can be analyzed either through medium granularity

or coarse granularity. Medium granularity represents the 1 to 5 scale – 1 (very

unnatural), 2 (unnatural), 3 (uncertain), 4 (natural), 5 (very natural) and the

coarse granularity replaces the 1 to 3 scale – 1 (ungrammatical), 2 (uncertain),

3 (grammatical). (Table 1)

Moreover, Demography of origin and Current demography were put

together to form Combined demography – also divided into medium and

coarse granularity. We can also observe that some features such as L2 and L3

have been delete from the file.

Table 1. Method of conversion of Likert scales

Large granularity 1 2 3 4 5 6 7 8 9 Medium granularity 1 2 3 4 5

Coarse granularity 1 2 3

8

3. Analysis and interpretation

The results that are in the table will be analysed in this section. Which means

we are going to see if our hypotheses are correct and if they will allow us to

consider which one of the two prepositions is more grammatical and in which

situation.

3.1 Results of the analysis of the different hypotheses

The results on our tables – and assuming they are correct – show us that

everyone seems to agree on the grammaticality that the use of sur to introduce

a location in present tense. The result is higher for very natural than for very

unnatural and this is true for all three categories of age. We have 219 very

natural against 71 very unnatural for young people, 65 very natural against 20

very unnatural for medium age people and finally 24 very natural against 21

very unnatural for older people. While Young and Medium categories have a

distinct separation between the two extremes, the old age people have only 3.

For all the categories, the distinction between natural, uncertain and unnatural

is also very small. (Table 1) This is why I decided to study the result of coarse

granularity of the temporality features (table 2).

Table 1 – Medium grammaticality of “sur” in present tense compared to the

age

PRESENT SUR - MEDIUM YOUNG MEDIUM OLD

VERY NATURAL 219 65 24

NATURAL 102 32 22

UNCERTAIN 35 19 11

UNNATURAL 35 21 15

VERY UNNATURAL 71 20 21

9

In Table 2, we can see that this time all the age categories have a bigger

distinction between the two extremes: grammatical and ungrammatical.

Furthermore, very few people are uncertain which gives us a better view of the

situation. Yet, there is only a difference of 10 between grammatical and

ungrammatical among the old age category (but still bigger than in Table 1).

Therefore, for this hypothesis I decided to consider only the coarse

granularity.

Table 2 – Coarse grammaticality of “sur” in present tense compared to the age

PRESENT SUR - COARSE YOUNG MEDIUM OLD

GRAMMATICAL 321 97 46

UNCERTAIN 35 19 11

UNGRAMMATICAL 106 41 36

Table 3 represents the use of à and sur to introduce a location in polite

language. For the preposition à, it goes decrescendo. Indeed, there are 247

native French speakers who think it is very natural to use à against only 37

who think it is very unnatural. But it decreases the more we approach very

natural. The same does not occurs with sur. Indeed, it seems to be a stable

number for all the categories. We have 111 for very natural, 138 for natural,

75 for uncertain, 69 for unnatural and 83 for very unnatural. Since the

variation for this column is not a distinctive variation, I decided to also

analyze the coarse granularity.

10

Table 3 – Medium grammaticality of “à” and “sur” in formal language

A SUR

VERY NATURAL 247 111

NATURAL 102 138

UNCERTAIN 41 75

UNNATURAL 49 69

VERY UNNATURAL 37 83

The coarse granularity shows us a distinction between the two extremes which

a little bigger than the medium granularity. Indeed, for sur, since grammatical

is the addition of very natural and natural, and ungrammatical is the addition

of unnatural and very unnatural, the results are a bit more distinct at the two

extremes: 249 – grammatical and 152 – ungrammatical. Therefore, coarse will

be better to examine. (Table 4)

Table 4 – Coarse grammaticality of “à” and “sur” in formal language

A SUR

GRAMMATICAL 349 249

UNCERTAIN 41 75

UNGRAMMATICAL 86 152

The last table represents the influence of the demography on the use of sur to

locate a big city. Very few people think that it is uncertain, or ungrammatical,

all category included. More people think that it is grammatical to use sur in

order to introduce big cities: 111 – mixed, 113 – rural, and 311 – urban (Table

5).

11

Table 5 – Coarse grammaticality of the influence of the demography on the

use of sur to locate big cities

MIXED RURAL URBAN

GRAMMATICAL 111 113 311

UNCERTAIN 13 13 31

UNGRAMMATICAL 29 27 65

3.2 Results of Temporality related to the age hypothesis’ test

Simply by looking at the figure 1, represented below, we can say there are

more people who think sur in present tense is grammatical than

ungrammatical, and this applies for every age-based category: young, medium

and old. Indeed, with 321 instances of grammaticality of sur for young, 97 for

medium and 46 for old, against 106 instances of ungrammaticality for young,

41 for medium and 36 for old and 35 instances of uncertainty for young, 19 for

medium and 11 for old, we can assume that every age-based category tends to

think that the use of sur in present tense is grammatical, but we cannot base

our thought on only one figure.

12

Figure 1 – Grammaticality of sur in present tense compared to the age

The test of this hypothesis was made by the software R and the Pearson’s Chi-

squared test. Thanks to the results of Table 2, we could proceed in the

Pearson’s chi-squared test, which gave us the p-value.

This is the p-value found after the chi-squared test has been made. To be

significant, the p-value must be p<0.05. Our p-value is in this range since it is

p<0.003. Therefore, the p-value for this hypothesis is significant.

Since this value is significant it allows us to continue with Pearson’s

residual which will help us to determine which correlations or anti-correlations

are responsible for the significant difference between the sets.

321

97

46

35

19

11

106

41

36

0% 20% 40% 60% 80% 100%

YOUNG

MEDIUM

OLD

GRAMMATICAL UNCERTAIN UNGRAMMATICAL

p-value = 0.002891

13

Table 7 – Pearson’s residuals

YOUNG MEDIUM OLD

GRAMMATICAL 1.148097 -0.5254148 -1.8762594

UNCERTAIN -1.105104 1.2327751 0.8613634

UNGRAMMATICAL -1.169532 0.1019261 2.4742719

Starting with the young age category, a correlation of (+1.14) means

that there is a positive relation between the grammaticality and the young

generation (table 7). But the contrary also occurs. Indeed, there are two anti-

correlations which show a negative relation (-1.105; -1.169). It happens

between uncertain/ungrammatical and the young category. These correlations

are above the range of Pearson’s residual, but they have still numbers that are

away from 0 and not far away from 1 and -1 for young age category and the

different grammatical levels. Therefore, there is a relation between the young

age-based category and the grammatical level of sur. When the correlation or the anti-correlation is close to 0, it means that

the relation between the two variants is either negative or positive but also too

weak and that if one of them changes, the other one would not be affected. In

the case of the medium age-based category, in Table 7, a correlation of +0,10

occurs for ungrammatical, which is a very low and weak correlation.

Therefore, the ungrammatical and medium categories do have a relation but it

is too weak to be taken into consideration. A (anti-)correlation (−)1,0 and

(−)0,5 means there is a strong relation. In table 7, there is an anti-correlation of

-0,50, therefore between -1,0 and -0,5, and thus it is a negative and strong

correlation. Regarding the uncertain level of grammaticality, the correlation is

above the Pearson’s range but still between 2 and -2 meaning there is a

relation between uncertain and medium but yet, not as strong as the

grammatical/medium negative relation.

Regarding the old generation category, a correlation of +0.80 means

that there is a positive and very strong relation between the two categories

tested. This one concerns the uncertain grammatical level. Therefore, a strong

14

relation exists between them. Which means that old age-based category is

more uncertain about the use of sur in present tense. The ungrammatical and

grammatical numbers are above the range of Pearson’s residual, but yet, they

are still numbers distant from 0 and close to 2 and -2. Indeed, an anti-

correlation of -1.8762594 occurs between grammatical and old and a

correlation of 2.4742719 between ungrammatical and old. Therefore, a

relation between them exists but not as strong as the old/uncertain relation.

3.3 Results of the use of sur and à in formal language hypothesis’ test

When it comes to formal register, Figure 2 partially proves that à is more used

than sur in a formal register. Indeed, with 349 grammatical instances of à

against 249 grammatical instances for sur, and vice versa with 86 entries of à

and 152 entries of sur for ungrammatical, there is a chance that à is more

grammatical and used in polite language than sur.

Figure 2 – Grammaticality of the prepositions à and sur in formal register

349

249

41

75

86

152

0% 20% 40% 60% 80% 100%

FORM A

FORM SUR


15

To test the use of à and sur in formal register hypothesis, we still use the same

tools as the previous hypothesis to find the p-value: The R software, the

Pearson’s Chi-squared test and the results found in Table 4.

This is the p-value found after the chi-squared test has been made. To be

significant, the p-value must be p<0.05. Our p-value is smaller than p<0.05

thus, it is significant. Therefore, it allows us to continue and to proceed with

Pearson’s residuals to determine our hypothesis’ correlations and anti-

correlations.

Table 8 – Pearson’s residuals of the à and sur in formal register

FORMAL A FORMAL SUR

GRAMMATICAL 2.891575 -2.891575

UNCERTAIN -2.232209 2.232209

UNGRAMMATICAL -3.025105 3.025105

Even if all these numbers are above the Pearson’s range (-1/1), they

still are away from 0 and are between 3 and -3, which means that a relation

between the sets exists, but not as strong as we could have expected.

Nevertheless, we can still analyze the residuals.

Regarding the grammatical level of the formal à category, we can see that

there is a correlation with grammaticality. This correlation of 2.891575 shows

that people tend to think using à in a formal register is grammatical.

Furthermore, the two other numbers of the column are both anti-correlations.

Indeed, they both indicate a negative relation between the level of

grammaticality and the use of à in a formal utterance. Uncertain and formal à

have an anti-correlation of -2.232209 and ungrammatical and formal à have an

p-value = 1.7e-10

16

anti-correlation of -3.025105. Therefore, a negative relation between

ungrammatical/formal à and uncertain/formal à does exits. (Table 8)

Concerning the second column, formal sur, the contrary occurs. Indeed, there

is an anti-correlation for grammatical of -2.891575. Furthermore, uncertain

has a correlation of 2.232209 and ungrammatical has correlation of 3.025105,

which means ungrammatical and uncertain, and formal à have a positive

relation.

These (anti-)correlations emphasize the previous results. Indeed, where there

are anti-correlations, there are correlations on the other side, and where there

are correlations, there are anti-correlations on the other hand. Therefore, it

points out that it is grammatical for native speakers to use à in a formal

register and uncertain or grammatical to use sur in this same register.

3.4. Results of the demography and the use of sur to introduce big cities relation hypothesis’ test Regarding the different demography categories, everyone seems to think that

using sur to introduce a big city is grammatical. Indeed, we have 311 entries

for grammatical, against 31 for uncertain and 65 for grammatical for urban

demography. Furthermore, we have 113 instances of grammatical, 13 of

uncertain and 27 of ungrammatical for rural people. And finally, for mixed

demography, we have 111 instances of grammatical, 13 of uncertain and 29 of

ungrammatical. So far, in all categories, most people think sur is grammatical,

and very few of them consider it as uncertain or ungrammatical. Here, it does

not seem necessary to proceed with all the Pearson’s test since almost

everybody agrees, but I will do them in order to confirm our supposition.

17

Figure 3 – Grammaticality of sur to introduce big cities compared to the

people demography

To test our hypothesis, we will use the exact same tools as above; the R

software, the Pearson’s Chi-squared test and the results found in Table 5, in

order to find the p-value and see if we can continue with the Pearson’s

residuals.

111

113

311

13

13

31

29

27

65

0% 20% 40% 60% 80% 100%

MIXED

RURAL

URBAN


p-value = 0.9002

18

This is the p-value found after the chi-squared test. To be significant, our p-

value must be p<0.05. The one we found is higher than 0.05, which means that

our results are not significant.

4. Discussion

In this report, we have seen and analyzed three hypotheses which could have

explained the origins of the variation and differentiation of the two French

prepositions à and sur. Variation is a common characteristic to all languages.

Indeed, there are several ways to say the same thing. In our case, the slow

transfer of the use of à into sur. Our three hypotheses were the temporality

compared to the age of the utterers, the use à or sur in a formal register and the

relation between the demography and the use of sur to introduce big cities

(such as Paris, Amsterdam, etc.)

4.1 Temporality related to the age

For this hypothesis, we could assume that young age-based people are more

likely to use sur in present tense than medium or old age-based categories.

This analysis has shown that indeed, young tend to think that it is grammatical

to use sur. Some of the numbers in table 7 are above Pearson’s rage but there

are correlations for grammatical and anti-correlations for uncertain and

ungrammatical. This confirms that the young generation use sur to introduce a

location in present tense, or at least think it is grammatical. Regarding the medium age-based category, sur is ungrammatical, but

still could be uncertain maybe in few years or with more entries for the age

category. Indeed, the correlation of uncertain and the anti-correlation are both

near to 1 and -1, which indicate that it could variate with more entries of this

category or in few years.

19

Regarding the old category, we could assume that this generation was

totally against sur, which seems to be true but actually regarding the

correlation of ungrammatical table 7, but finally it seems they are more

uncertain about it. Indeed, there is a strong and positive relation between

uncertain and old age-based (+0.80), which means they are related and that

grammatical level of the French variation of the use of sur instead of à in

present tense is uncertain and not ungrammatical. Since our study does not

count as much people in old generation as in young generation we cannot have

a proper answer to that.

Therefore, it was thought that the old category was against the idea of

using sur in present tense, while it is not what we found. Indeed, actually, it is

the medium age category which thinks it is ungrammatical.

Yet, we can still say that the age has a big influence on the use of sur in

present tense and therefore has an influence on the French variation. And we

can assume that in a few years, this variation will be more visible since the

age-based categories will have a different type of people. For the whole

hypothesis, the correlations are supportive, but most of them are above

Pearson’s range. Two things can be the cause of that; the first one could be the

lake of homogeneity in the number of entries in the different age categories.

The second one could be the people who guessed what our study was based

on. It would be interesting to re-do this hypothesis and focus only on the

people who have not guessed the topic of our project.

4.2 Formality

As our previous hypothesis, all the numbers in table 8 are above the Pearson’s

range (1/-1) but are still analysable since there are far away from 0.

For this hypothesis it was expected that formal à was more grammatical than

formal sur. And this what our analysis has shown. Indeed, the level of

grammaticality: grammatical and formal à have a correlation meaning one

goes with the other. While grammatical and formal sur have an anti-

20

correlation, thus having a negative relation implying that if one grows the

other will decrease. Therefore, for French native speakers who have not

guessed our topic of report, formal à is grammatical and formal sur is not.

4.3 Demography compared to the introduction of big cities

This analysis and the Pearson’s Chi-squared test have determined that

the hypothesis of the influence of the demography on the use of sur to

introduce big cities was incorrect. Indeed, the result of the p-value was not

significant. That tells us the use of sur when one speaks about big cities is not

influenced by where this person comes from or lives.

Often we can hear people saying that urban or rural people have

different ways of expressing themselves, but this hypothesis shows that it is

not what happened when it comes to differentiating à and sur.

4.4 Conclusion

This report helped us with our problem: the difference between à and sur in

French, but it has not given us the answer to such a difference. Indeed, to

understand the differences between à and sur, one must be capable to

understand all the variations that occur as the years pass by. But still, these

analyses of different hypotheses allow us to keep or eliminate some social or

semantic features.

Furthermore, for this report, maybe more data would have helped, since it

would add a larger number of examples that could be used in order to go

further in the analyses. Indeed, two of our hypotheses have been confirmed but

it is not enough to know how to distinguish the use of à and sur. However, a

lot of hypotheses remain untested for the variation of the French grammar.

Download - The grammatical variation in French: the difference ... · The grammatical variation in French: the ... whether we should use ... In French, there are more tenses than in English

Top Related