The grammatical variation in French: the difference between the use of “à” and “sur” Name Namestudent number
EL25LINC
Sociolinguistics L3
LLCE English
2016/2017
Supervisor: Dylan Glynn
1
Table of contents
1. Introduction ........................................................................................................ 2
1.1 Temporality related to the age ........................................................................... 3
1.2 Formality .............................................................................................................. 3
1.3 Demography related to the introduction of a big city ........................................ 3
2. Data and method ................................................................................................ 4
2.1 Collecting the data ............................................................................................... 4
2.2 Method of analysis .............................................................................................. 6
3. Results and interpretation ................................................................................... 8
3.1 Results of the analysis of the different hypotheses ............................................. 8
3.2 Results of Temporality related to the age hypothesis’ test ............................... 11
3.3 Results of the use of sur and à in formal language hypothesis’ test ................. 14
3.4 Results of the demography and the use of sur to introduce big cities relation
hypothesis’ test ....................................................................................................... 16
4. Discussion ..........................................................................................................18
4.1 Temporality related to the age .......................................................................... 18
4.2 Formality ............................................................................................................ 19
4.3 Demography compared to the introduction of big cities ................................... 20
4.4 Conclusion ......................................................................................................... 20
The grammatical variation in French: the difference between the use of “à” and “sur” Name Name student number
2
1. Introduction
The difference between the two French prepositions à and sur is not clearly
defined in everyone’s minds. For native speakers, the use of the preposition à
or sur is instinctive. They know how to make the difference between them
when they express themselves thanks to grammaticality. In linguistics,
grammaticality is what is right or wrong in a sentence according to the native
speaker’s judgment. To non-native French speakers, when it comes to choose
between the two, it becomes difficult to be as instinctive as native speakers.
For a long time, French native speakers were using the preposition à in order
to indicate a location (“j’habite à Paris”; I live in Paris). Indeed, if people
search in a French grammar book, they will find five prepositions that
introduce a location in a sentence à, vers, en, dans and chez. These
prepositions can be translated by to, in, at, etc. Over the years, native speakers
slowly start to use the preposition sur instead of à. Sur is translated by “on
top” but in French, it is common to use it in order to introduce a location.
Therefore, among the native speakers, it seems not everyone agrees on
whether we should use à or sur to introduce a location: “à Paris” or “sur
Paris” for instance. A debate over which one is the more grammatical remains
among some native speakers of French. However, despite the increase of the
use of sur, we do not know why people tend to choose whether à or sur in
order to introduce a location. As native speakers, we decided to focus on this
subject and tried to understand what the differences between à and sur are.
To answer this problem, we have found several features that could
play a role in the distinction of these two prepositions. These several features
allowed us to find some semantic and social hypotheses. Here, we have tested
these features thanks to the use of a questionnaire and analyzed some which
might explain the differences between the two French prepositions. In order to
do this report, I chose several hypotheses that I will analyze.
3
1.1 Temporality related to the age
In French, there are more tenses than in English. Therefore, we have separated
the temporality feature into two sections: present and general. General
represents all the other tenses. There is a possibility that the age plays a role in
the use of sur in present tense. For this report, we will focus on which
category of age it seems to be more grammatical to use sur when introducing a
location in present tense. Here, we focus only on French native speakers.
1.2 Formality
In French, formality is not only a matter of the use of tu and vous but also a
matter of register. It refers to the formality of an expression, grammatical
structure, a word, gesture, etc. Formal French is both written and spoken. It is
a polite language used to show respect, instore a distance or when the speaker
does not know the other person. Here, we are going to see whether French
speakers think à or sur is more grammatical in formal language. And, we only
analyze French native speakers and people who have not guessed the topic of
our study.
1.3 Demography related to the introduction of a big city
This last hypothesis will be based on demography. Since not everyone comes
from the same place, it would be interesting to wonder if the demography
plays a role in the use of sur in order to introduce big cities. Here, we only
focus on French native speakers.
4
2. Data and Method
2.1 Collecting the data During our linguistics classes, we have found several social and semantic
features that could explain why this grammatical variation occurs and in which
situation people are most likely to use “à” or “sur”. We transformed these
features into questions in order to create a questionnaire which includes both
our semantic and social features. This questionnaire is composed of two
sections. The first one gathers information on the person who answers, thus
allowing us to collect social features in order to test our social hypotheses. The
second one focuses on the judgement of the person who answers. This second
section is based on 16 sentences divided in 8 instances of “à” and 8 instances
of “sur”. Here people have to answer on a scale of 1 to 9 if the sentence seems
grammatical or not according to their judgment – with 1 being completely
unnatural and 9 completely natural. To collect the data as accurately as
possible, we have used a nine-level Likert scale. A five-level Likert scale
might have been subject to distortions for this kind of questionnaire. Indeed,
for instance, people would avoid choosing an extreme answer with this one;
while they are more likely to choose 8 on a nine-level scale. Furthermore, 8 on
a nine-level scale becomes 5 on a five-level scale once we have converted the
first scale to the five-level scale (2.2).
The questionnaire is divided into two distinctive sections. The first set
represents social features. Therefore, we have these eight categories: AGE (15-25, 26-35, 36-45, 46-55, 56 and more), EDUCATION’S LEVEL (middle
school, high school, college), DEMOGRAPHY OF ORIGIN (urban, suburb,
rural), CURRENT DEMOGRAPHY (urban, suburb, rural), COUNTRY OF ORIGIN, MOTHER TONGUE and L2, L3. Among these categories, four of
them have defined answers (Age, Education, demography, and current
demography) and the four others are free-answers. The second set represents
5
the semantic features. These features are divided into four categories:
REGISTER (formal and informal), SIZE (big and small), FAMILIARITY
(familiar and non-familiar), and TEMPORALITY (present tense and other
tenses). For this set, the sentences used are the following:
(1) J’ai été retenu pour un entretien d’embauche à Paris.
Le diner d’affaires sur Paris auquel nous sommes invités est
primordial.
(2) On était sur Amsterdam pour se marrer.
Mes potes et moi on était à Lille pour se faire un ciné.
(3) Sur Paris, les gens sont plus désagréables qu’en province.
Les Jeux Olympiques d’hiver se dérouleront à Seoul en 2018.
(4) Ma grand-mère habitait sur Dame-Marie il y a 20 ans.
J’ai acheté du pain à Everly ce matin.
(5) Ma tante passe ses vacances sur Marseille cet été.
Ce soir, je dors à Bruxelles chez mon copain.
(6) Hier, sur Manchester, je me suis fait un foot.
À Flawinne, les restaurants sont réputés.
(7) Ecoute, moi je suis sur Paris cet après-midi.
Mes amis passent le week-end à Strasbourg.
(8) Depuis son enfance, il a vécu sur Bordeaux avec sa mère.
Et tu sais ? À Lyon la semaine dernière, j’ai croisé Hervé.
6
In the above examples, Register is defined by the level of formality:
formal (1) and informal (2). The Size of the cities is categorized by either big
(3) or small (4) cities regarding the size of the agglomeration. The Familiarity
regards the cities but also the knowledge of the speakers; if one is familiar (5)
or non-familiar (6) with the mentioned city. The last feature, Temporality,
indicates the tense of the sentence: present tense (7) and the other tenses (8).
To collect all the data, we have distributed the questionnaire all around
us. But it was not enough. Indeed, most of them were people of our age or our
parents, and all the results were not numerous enough in every category.
Therefore, in order to have more diversified results we have created a
numerical questionnaire thanks to Google Forms and shared it on the Internet.
Thanks to that, we have collected 870 results – including 713 native speakers
of French.
2.2 Method of the analysis
To test and analyze our hypotheses, I extracted the data that were interesting
from the excel file where all our data were combined. In this same file, I made
different sheets for each of my different hypotheses, where I made a pivot
table, in order to have my results in term of numbers. Still in the same sheets, I
made a smaller table and copied/pasted it in bloc notes for later use.
Then, I used a software called R and the R commands files provided
by our professor where you can find the commands required to do this
analysis. Afterwards, thanks to the xtab command, I have imported one of the
bloc notes file on R. In order to find the margin of error – the p-value, I have
used the CHI command, which allows us to test my hypothesis. If the p-value
is p<0.05 our results are significant but on the contrary if the p-value is p>0.05
it is not. If the p-value is significant we can thus proceed and analyze the
correlation and anti-correlation thanks to the RES command. Thanks to the
residuals command, we can analyze the correlations (positives numbers) and
7
anti-correlations (negatives numbers). And then, I did the same for the other
bloc note files.
Regarding the excel files, I decided to work only with the French
native speakers. Indeed, here, we focus on the grammaticality and therefore,
using only native speakers will be more useful since grammaticality is based
on the native speaker’s judgement. Therefore, it leaves us with 713 entries to
analyze instead of 870. Furthermore, we have included a granularity system.
Indeed, in the questionnaire the scale was a nine-level Likert scale and now in
the excel files it becomes both a five-level and three-level Likert scales.
Therefore, some features can be analyzed either through medium granularity
or coarse granularity. Medium granularity represents the 1 to 5 scale – 1 (very
unnatural), 2 (unnatural), 3 (uncertain), 4 (natural), 5 (very natural) and the
coarse granularity replaces the 1 to 3 scale – 1 (ungrammatical), 2 (uncertain),
3 (grammatical). (Table 1)
Moreover, Demography of origin and Current demography were put
together to form Combined demography – also divided into medium and
coarse granularity. We can also observe that some features such as L2 and L3
have been delete from the file.
Table 1. Method of conversion of Likert scales
Large granularity 1 2 3 4 5 6 7 8 9 Medium granularity 1 2 3 4 5
Coarse granularity 1 2 3
8
3. Analysis and interpretation
The results that are in the table will be analysed in this section. Which means
we are going to see if our hypotheses are correct and if they will allow us to
consider which one of the two prepositions is more grammatical and in which
situation.
3.1 Results of the analysis of the different hypotheses
The results on our tables – and assuming they are correct – show us that
everyone seems to agree on the grammaticality that the use of sur to introduce
a location in present tense. The result is higher for very natural than for very
unnatural and this is true for all three categories of age. We have 219 very
natural against 71 very unnatural for young people, 65 very natural against 20
very unnatural for medium age people and finally 24 very natural against 21
very unnatural for older people. While Young and Medium categories have a
distinct separation between the two extremes, the old age people have only 3.
For all the categories, the distinction between natural, uncertain and unnatural
is also very small. (Table 1) This is why I decided to study the result of coarse
granularity of the temporality features (table 2).
Table 1 – Medium grammaticality of “sur” in present tense compared to the
age
PRESENT SUR - MEDIUM YOUNG MEDIUM OLD
VERY NATURAL 219 65 24
NATURAL 102 32 22
UNCERTAIN 35 19 11
UNNATURAL 35 21 15
VERY UNNATURAL 71 20 21
9
In Table 2, we can see that this time all the age categories have a bigger
distinction between the two extremes: grammatical and ungrammatical.
Furthermore, very few people are uncertain which gives us a better view of the
situation. Yet, there is only a difference of 10 between grammatical and
ungrammatical among the old age category (but still bigger than in Table 1).
Therefore, for this hypothesis I decided to consider only the coarse
granularity.
Table 2 – Coarse grammaticality of “sur” in present tense compared to the age
PRESENT SUR - COARSE YOUNG MEDIUM OLD
GRAMMATICAL 321 97 46
UNCERTAIN 35 19 11
UNGRAMMATICAL 106 41 36
Table 3 represents the use of à and sur to introduce a location in polite
language. For the preposition à, it goes decrescendo. Indeed, there are 247
native French speakers who think it is very natural to use à against only 37
who think it is very unnatural. But it decreases the more we approach very
natural. The same does not occurs with sur. Indeed, it seems to be a stable
number for all the categories. We have 111 for very natural, 138 for natural,
75 for uncertain, 69 for unnatural and 83 for very unnatural. Since the
variation for this column is not a distinctive variation, I decided to also
analyze the coarse granularity.
10
Table 3 – Medium grammaticality of “à” and “sur” in formal language
A SUR
VERY NATURAL 247 111
NATURAL 102 138
UNCERTAIN 41 75
UNNATURAL 49 69
VERY UNNATURAL 37 83
The coarse granularity shows us a distinction between the two extremes which
a little bigger than the medium granularity. Indeed, for sur, since grammatical
is the addition of very natural and natural, and ungrammatical is the addition
of unnatural and very unnatural, the results are a bit more distinct at the two
extremes: 249 – grammatical and 152 – ungrammatical. Therefore, coarse will
be better to examine. (Table 4)
Table 4 – Coarse grammaticality of “à” and “sur” in formal language
A SUR
GRAMMATICAL 349 249
UNCERTAIN 41 75
UNGRAMMATICAL 86 152
The last table represents the influence of the demography on the use of sur to
locate a big city. Very few people think that it is uncertain, or ungrammatical,
all category included. More people think that it is grammatical to use sur in
order to introduce big cities: 111 – mixed, 113 – rural, and 311 – urban (Table
5).
11
Table 5 – Coarse grammaticality of the influence of the demography on the
use of sur to locate big cities
MIXED RURAL URBAN
GRAMMATICAL 111 113 311
UNCERTAIN 13 13 31
UNGRAMMATICAL 29 27 65
3.2 Results of Temporality related to the age hypothesis’ test
Simply by looking at the figure 1, represented below, we can say there are
more people who think sur in present tense is grammatical than
ungrammatical, and this applies for every age-based category: young, medium
and old. Indeed, with 321 instances of grammaticality of sur for young, 97 for
medium and 46 for old, against 106 instances of ungrammaticality for young,
41 for medium and 36 for old and 35 instances of uncertainty for young, 19 for
medium and 11 for old, we can assume that every age-based category tends to
think that the use of sur in present tense is grammatical, but we cannot base
our thought on only one figure.
12
Figure 1 – Grammaticality of sur in present tense compared to the age
The test of this hypothesis was made by the software R and the Pearson’s Chi-
squared test. Thanks to the results of Table 2, we could proceed in the
Pearson’s chi-squared test, which gave us the p-value.
This is the p-value found after the chi-squared test has been made. To be
significant, the p-value must be p<0.05. Our p-value is in this range since it is
p<0.003. Therefore, the p-value for this hypothesis is significant.
Since this value is significant it allows us to continue with Pearson’s
residual which will help us to determine which correlations or anti-correlations
are responsible for the significant difference between the sets.
321
97
46
35
19
11
106
41
36
0% 20% 40% 60% 80% 100%
YOUNG
MEDIUM
OLD
GRAMMATICAL UNCERTAIN UNGRAMMATICAL
p-value = 0.002891
13
Table 7 – Pearson’s residuals
YOUNG MEDIUM OLD
GRAMMATICAL 1.148097 -0.5254148 -1.8762594
UNCERTAIN -1.105104 1.2327751 0.8613634
UNGRAMMATICAL -1.169532 0.1019261 2.4742719
Starting with the young age category, a correlation of (+1.14) means
that there is a positive relation between the grammaticality and the young
generation (table 7). But the contrary also occurs. Indeed, there are two anti-
correlations which show a negative relation (-1.105; -1.169). It happens
between uncertain/ungrammatical and the young category. These correlations
are above the range of Pearson’s residual, but they have still numbers that are
away from 0 and not far away from 1 and -1 for young age category and the
different grammatical levels. Therefore, there is a relation between the young
age-based category and the grammatical level of sur. When the correlation or the anti-correlation is close to 0, it means that
the relation between the two variants is either negative or positive but also too
weak and that if one of them changes, the other one would not be affected. In
the case of the medium age-based category, in Table 7, a correlation of +0,10
occurs for ungrammatical, which is a very low and weak correlation.
Therefore, the ungrammatical and medium categories do have a relation but it
is too weak to be taken into consideration. A (anti-)correlation (−)1,0 and
(−)0,5 means there is a strong relation. In table 7, there is an anti-correlation of
-0,50, therefore between -1,0 and -0,5, and thus it is a negative and strong
correlation. Regarding the uncertain level of grammaticality, the correlation is
above the Pearson’s range but still between 2 and -2 meaning there is a
relation between uncertain and medium but yet, not as strong as the
grammatical/medium negative relation.
Regarding the old generation category, a correlation of +0.80 means
that there is a positive and very strong relation between the two categories
tested. This one concerns the uncertain grammatical level. Therefore, a strong
14
relation exists between them. Which means that old age-based category is
more uncertain about the use of sur in present tense. The ungrammatical and
grammatical numbers are above the range of Pearson’s residual, but yet, they
are still numbers distant from 0 and close to 2 and -2. Indeed, an anti-
correlation of -1.8762594 occurs between grammatical and old and a
correlation of 2.4742719 between ungrammatical and old. Therefore, a
relation between them exists but not as strong as the old/uncertain relation.
3.3 Results of the use of sur and à in formal language hypothesis’ test
When it comes to formal register, Figure 2 partially proves that à is more used
than sur in a formal register. Indeed, with 349 grammatical instances of à
against 249 grammatical instances for sur, and vice versa with 86 entries of à
and 152 entries of sur for ungrammatical, there is a chance that à is more
grammatical and used in polite language than sur.
Figure 2 – Grammaticality of the prepositions à and sur in formal register
349
249
41
75
86
152
0% 20% 40% 60% 80% 100%
FORM A
FORM SUR
GRAMMATICAL UNCERTAIN UNGRAMMATICAL
15
To test the use of à and sur in formal register hypothesis, we still use the same
tools as the previous hypothesis to find the p-value: The R software, the
Pearson’s Chi-squared test and the results found in Table 4.
This is the p-value found after the chi-squared test has been made. To be
significant, the p-value must be p<0.05. Our p-value is smaller than p<0.05
thus, it is significant. Therefore, it allows us to continue and to proceed with
Pearson’s residuals to determine our hypothesis’ correlations and anti-
correlations.
Table 8 – Pearson’s residuals of the à and sur in formal register
FORMAL A FORMAL SUR
GRAMMATICAL 2.891575 -2.891575
UNCERTAIN -2.232209 2.232209
UNGRAMMATICAL -3.025105 3.025105
Even if all these numbers are above the Pearson’s range (-1/1), they
still are away from 0 and are between 3 and -3, which means that a relation
between the sets exists, but not as strong as we could have expected.
Nevertheless, we can still analyze the residuals.
Regarding the grammatical level of the formal à category, we can see that
there is a correlation with grammaticality. This correlation of 2.891575 shows
that people tend to think using à in a formal register is grammatical.
Furthermore, the two other numbers of the column are both anti-correlations.
Indeed, they both indicate a negative relation between the level of
grammaticality and the use of à in a formal utterance. Uncertain and formal à
have an anti-correlation of -2.232209 and ungrammatical and formal à have an
p-value = 1.7e-10
16
anti-correlation of -3.025105. Therefore, a negative relation between
ungrammatical/formal à and uncertain/formal à does exits. (Table 8)
Concerning the second column, formal sur, the contrary occurs. Indeed, there
is an anti-correlation for grammatical of -2.891575. Furthermore, uncertain
has a correlation of 2.232209 and ungrammatical has correlation of 3.025105,
which means ungrammatical and uncertain, and formal à have a positive
relation.
These (anti-)correlations emphasize the previous results. Indeed, where there
are anti-correlations, there are correlations on the other side, and where there
are correlations, there are anti-correlations on the other hand. Therefore, it
points out that it is grammatical for native speakers to use à in a formal
register and uncertain or grammatical to use sur in this same register.
3.4. Results of the demography and the use of sur to introduce big cities relation hypothesis’ test Regarding the different demography categories, everyone seems to think that
using sur to introduce a big city is grammatical. Indeed, we have 311 entries
for grammatical, against 31 for uncertain and 65 for grammatical for urban
demography. Furthermore, we have 113 instances of grammatical, 13 of
uncertain and 27 of ungrammatical for rural people. And finally, for mixed
demography, we have 111 instances of grammatical, 13 of uncertain and 29 of
ungrammatical. So far, in all categories, most people think sur is grammatical,
and very few of them consider it as uncertain or ungrammatical. Here, it does
not seem necessary to proceed with all the Pearson’s test since almost
everybody agrees, but I will do them in order to confirm our supposition.
17
Figure 3 – Grammaticality of sur to introduce big cities compared to the
people demography
To test our hypothesis, we will use the exact same tools as above; the R
software, the Pearson’s Chi-squared test and the results found in Table 5, in
order to find the p-value and see if we can continue with the Pearson’s
residuals.
111
113
311
13
13
31
29
27
65
0% 20% 40% 60% 80% 100%
MIXED
RURAL
URBAN
GRAMMATICAL UNCERTAIN UNGRAMMATICAL
p-value = 0.9002
18
This is the p-value found after the chi-squared test. To be significant, our p-
value must be p<0.05. The one we found is higher than 0.05, which means that
our results are not significant.
4. Discussion
In this report, we have seen and analyzed three hypotheses which could have
explained the origins of the variation and differentiation of the two French
prepositions à and sur. Variation is a common characteristic to all languages.
Indeed, there are several ways to say the same thing. In our case, the slow
transfer of the use of à into sur. Our three hypotheses were the temporality
compared to the age of the utterers, the use à or sur in a formal register and the
relation between the demography and the use of sur to introduce big cities
(such as Paris, Amsterdam, etc.)
4.1 Temporality related to the age
For this hypothesis, we could assume that young age-based people are more
likely to use sur in present tense than medium or old age-based categories.
This analysis has shown that indeed, young tend to think that it is grammatical
to use sur. Some of the numbers in table 7 are above Pearson’s rage but there
are correlations for grammatical and anti-correlations for uncertain and
ungrammatical. This confirms that the young generation use sur to introduce a
location in present tense, or at least think it is grammatical. Regarding the medium age-based category, sur is ungrammatical, but
still could be uncertain maybe in few years or with more entries for the age
category. Indeed, the correlation of uncertain and the anti-correlation are both
near to 1 and -1, which indicate that it could variate with more entries of this
category or in few years.
19
Regarding the old category, we could assume that this generation was
totally against sur, which seems to be true but actually regarding the
correlation of ungrammatical table 7, but finally it seems they are more
uncertain about it. Indeed, there is a strong and positive relation between
uncertain and old age-based (+0.80), which means they are related and that
grammatical level of the French variation of the use of sur instead of à in
present tense is uncertain and not ungrammatical. Since our study does not
count as much people in old generation as in young generation we cannot have
a proper answer to that.
Therefore, it was thought that the old category was against the idea of
using sur in present tense, while it is not what we found. Indeed, actually, it is
the medium age category which thinks it is ungrammatical.
Yet, we can still say that the age has a big influence on the use of sur in
present tense and therefore has an influence on the French variation. And we
can assume that in a few years, this variation will be more visible since the
age-based categories will have a different type of people. For the whole
hypothesis, the correlations are supportive, but most of them are above
Pearson’s range. Two things can be the cause of that; the first one could be the
lake of homogeneity in the number of entries in the different age categories.
The second one could be the people who guessed what our study was based
on. It would be interesting to re-do this hypothesis and focus only on the
people who have not guessed the topic of our project.
4.2 Formality
As our previous hypothesis, all the numbers in table 8 are above the Pearson’s
range (1/-1) but are still analysable since there are far away from 0.
For this hypothesis it was expected that formal à was more grammatical than
formal sur. And this what our analysis has shown. Indeed, the level of
grammaticality: grammatical and formal à have a correlation meaning one
goes with the other. While grammatical and formal sur have an anti-
20
correlation, thus having a negative relation implying that if one grows the
other will decrease. Therefore, for French native speakers who have not
guessed our topic of report, formal à is grammatical and formal sur is not.
4.3 Demography compared to the introduction of big cities
This analysis and the Pearson’s Chi-squared test have determined that
the hypothesis of the influence of the demography on the use of sur to
introduce big cities was incorrect. Indeed, the result of the p-value was not
significant. That tells us the use of sur when one speaks about big cities is not
influenced by where this person comes from or lives.
Often we can hear people saying that urban or rural people have
different ways of expressing themselves, but this hypothesis shows that it is
not what happened when it comes to differentiating à and sur.
4.4 Conclusion
This report helped us with our problem: the difference between à and sur in
French, but it has not given us the answer to such a difference. Indeed, to
understand the differences between à and sur, one must be capable to
understand all the variations that occur as the years pass by. But still, these
analyses of different hypotheses allow us to keep or eliminate some social or
semantic features.
Furthermore, for this report, maybe more data would have helped, since it
would add a larger number of examples that could be used in order to go
further in the analyses. Indeed, two of our hypotheses have been confirmed but
it is not enough to know how to distinguish the use of à and sur. However, a
lot of hypotheses remain untested for the variation of the French grammar.