the effects of economic status on international soccer success

31
The Effects of Economic Status on International Soccer Success BY BAILEY MORTON THESIS This thesis is submitted to the faculty of the Economics Department of the University of Florida in partial fulfillment of the requirements for the Bachelor of Arts degree Gainesville, Florida Approved by: ______________________ Dr. Michelle Phillips Thesis Advisor

Upload: others

Post on 07-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

The Effects of Economic Status on International Soccer Success

BY

BAILEY MORTON

THESIS

This thesis is submitted to the faculty of the Economics Department

of the University of Florida in partial fulfillment of the requirements

for the Bachelor of Arts degree

Gainesville, Florida

Approved by:

______________________

Dr. Michelle Phillips

Thesis Advisor

2

I. Introduction

European and South American countries have historically dominated the international

soccer landscape. This continues to be the case in the 21st century. Further, only European and

South American countries hold World Cup titles, with Europe holding the most titles among all

continents, 12, and having come out on top in each of the last four tournaments (World of

Soccer). Per the ELO ratings for world soccer, most countries in the top 50 are predominantly

South American or European. Typically, these European countries are more developed and thus

wealthier. When considering the economic strength, one usually can conclude that richer and

more developed countries can provide better resources and facilities for their citizens. If this is

the case, then it would seem reasonable to assume that the higher the level of economic strength

a country has, the better it would perform on the international stage, since richer countries have

access to more facilities, coaches and technologies to enhance player performance. Yet, when

considering the 2018 ELO ratings, the U.S., one of the richest countries in the world, sits only at

25th, with other developing countries, such as Senegal, only 6 places behind them. Furthermore,

many of those same South American power-houses are classified as developing countries and are

relatively poorer, but have consistently ranked far above the U.S.

Thus, this paper attempts to explore the effects of real GDP per capita and other

economic factors such as income inequality, per capita education spending, urbanization,

popularity, and corruption on a country’s success on the international soccer stage.

II. Sample

The sample that will be used for this study will be those of countries, out of the 238

countries that are present in the ELO ratings system, which have played at least 100 matches, in

3

the years 2002-2016 resulting in 2658 observations. More specifically, we will use yearly

observations from that aforementioned time frame for this study. Thus, the number of teams who

have played 100 matches was larger in some years than others. This includes all countries that

have participated in both friendly and competitive matches that were played under major soccer

confederations. The reasons for selecting countries based on this characteristic are that “ratings

tend to converge to a team’s true strength…after about 30 matches” (ELO) and because having

played 100 matches usually indicates that the team has existed long enough to play in a greater

number of competitive matches. These matches can include, World Cup qualifiers, or regional

championships like the Copa America, African Cup of Nations, or Gold Cup in North America.

III. Dependent Variable

The dependent variable that will be considered is a country’s international soccer success,

which will be measured by the ELO rating system, using the years 2002-2016. These years were

chosen to study the game during the 21st century world cup era and at a time with a more

globalized and ever-changing world. Since many of the economic data was unavailable for 2018,

the range was reduced to 2002-2016. An ELO rating is calculated by “adding a weighting for the

kind of match, an adjustment for the home team advantage, and an adjustment for goal difference

in the match result” (ELO). Thus, this rating system is slightly more comprehensive than the

traditional FIFA ranking and reduces any bias that might occur if a team has played fewer

matches, participated in fewer major tournaments, performed better against weaker teams, etc.

Further, it also displays the point totals for each team-which the FIFA rankings do not do-so that

the spread between teams is more visible. For the full explanation of the ELO rating calculation

4

and the FIFA ranking calculation, visit https://www.eloratings.net/about and

https://www.fifa.com/fifa-world-ranking/procedure/men.html .

IV. Independent Variables

Real GDP Per Capita

Thus, this study will make use of a country’s yearly real GDP per capita for the years

2002 -2018 from the World Bank. It seems that richer, more developed countries would perform

better on the international stage since they can fund the construction of more stadiums and

training facilities, hire better coaches and utilize modern statistical technologies to study player

performance. On the other hand, countries with larger populations will have a larger pool of

players to draw from, so using GDP per capita instead of aggregate GDP allows for a better,

more relative comparison of income. However, countries like Paraguay and Senegal, which

typically are ranked in the high 20’s and low to mid 30’s, in the ELO rating system, are usually

ranked in the low to mid 100’s in GDP ranking, as evidenced by the CIA World Fact Book and

the World Bank. There seems to be some positive correlation between ranking and GDP, in that

nearly half of the top 50 ELO-ranked countries are also amongst the top 50 GDP-ranked

countries. But the spread, in terms of wealth, on average, increases significantly from ELO

ratings between 50 and 100. For example, developed countries like Canada and Slovenia have

ranked below developing countries like Honduras and Iraq. Thus, it appears that higher GDP

does not necessarily guarantee international soccer success.

Income Inequality

5

The level of income inequality in a country can vary significantly amongst developing

and developed countries. It can be seen from the CIA World Fact Book, which measures Gini

coefficients on a scale that ranges from 0 to 100, with 0 referring to perfect equality and 100

referring to perfect inequality, that a developed country like Hong Kong has one of the highest

levels of inequality, with 53.9-which sits Hong Kong at 9th on the list. Similarly, the U.S. and

Singapore, two other developed countries, are indexed at 45.9 and 45.0, showing up in the top 40

for income equality level. While most developed countries are clustered at the bottom and the

least developed at the top, there are several countries from each level of development that stray

from their respective trends. However, since the data for Gini coefficients for each year, from

2002-2016, is not available, we will use, as a proxy for Gini Coefficients, the share of income by

the top 1% in each country to measure income inequality. This will be measured as a percentage

of the share of the total income in a country. This data will be provided by the Top Income Index

database. Furthermore, if we look at the top ELO-ranked countries, those that are European and

South American countries, we can see that many South American countries like Brazil, Chile,

and Colombia are within the 20 highest levels of income inequality, while strong European

teams like France, Germany, and Croatia show up in the bottom 20. The fact that most

developed, high ELO-ranked Europe countries have tighter inequality gaps allows more people

at all income levels to invest time in soccer. Then, since Europe boasts the strongest club leagues

and thus is home to a pool of powerful investors, it allows its countries to have the money to

support stadium construction and focus on player development and grow and secure local talent.

On the other hand, in South America, where income inequality is greater, it may be the case that

these countries have equally superior access to talent because that opportunities for good

employment are slimmer and playing soccer can become the only alternative or path to financial

6

success and sustainability. Therefore, there is a potential for a greater number of high-level

players available who wish to escape poverty and lack of opportunities.

Education Spending Per Capita

It is worth it to consider the level of education of spending a country achieves as it

reflects a country’s commitment to the growth of its citizens’ abilities, skills, and contributions to

the economy. Typically, developed countries are more concerned with greater investments in

education because they can afford to better fund their students to keep their economies more

productive, so that education further develops a country and conversely. It is also possible that

countries with better public-school funding can boast stronger soccer programs. Thus, this

funding could allow players to have a better opportunity to develop and be exposed to the sport

at a young age. Further, they might then have an easier time be scouted at the professional level,

if they can make a name for themselves within local school leagues. Thus, this study will make

use of the percent of government expenditures on education spending for the years 2002- 2018.

These percentages will be multiplied by the real GDP per capita amounts and divided by the

population to convert this metric into per capita terms. A brief insight can be seen by studying a

small but notable sample of countries. If we consider the OECD countries, then we can see that

the more developed, richer countries spend more on primary to non-secondary education per

capita. This includes highly ELO-rated countries like the dominant European countries including

France, Belgium, Germany, Netherlands, as well as the United States, South Korea, etc. (OECD)

However, on average, South American powerhouses, with high ELO ratings like Argentina,

Colombia, Brazil and Chile rank towards the bottom of this group (OECD).

7

Urbanization: The level of urbanization of a country, i.e. the number of people living in

urbanized areas within a country, is indicative of the developmental stage of a country. For this

study, the yearly percentage of people living in urban areas, as estimated by the World Bank,

from the years 2002-2018, will be analyzed. The inclusion of this variable is related to the far-

reaching effects urbanization has on a country. Historically, urban areas have allowed for better

access to health care, nutrition, goods and services, jobs, as well as facilities for entertainment

(Lore Central). Thus, it is of interest for this study to see if rising levels of urbanization, which

traditionally lead to higher access to health/nutrition, employment, as well as facilities for sports

teams allow for countries to better develop players for international success.

Popularity: An important aspect of success can be seen regarding the amount of media coverage

soccer receives in a country as well as its relative popularity, in that “popularity of a sport

depends on its broader significance within a nation's culture” (Hoffman et. al). It is reasonable to

assume that said significance to a nation’s culture can be created due to accessibility and

awareness of the sport. Further, it appears this popularity can be a result of intense media

coverage of a sport relative to other sports, news, and ideas. An example can be seen in the U.S.

where the popularity of soccer pales in comparison to that of sports such as football and

basketball, due to intense media coverage by networks such as ESPN on those sports, in addition

to the historical popularity of those sports in American culture. Thus, it is possible that an

oversaturation of media content relating to soccer can influence more people to pursue

professional careers in the sport, due to its significance in their country. To measure this concept

of popularity, the Google Trends Index for searches of “FIFA”, a language-neutral word, from

2004-2016, will be used to indicate the number of people who will have been potentially exposed

8

to soccer-related media such as matches or internet highlights, since FIFA owns most of these

videos, photos, etc. This analysis will be included in the appendix.

Corruption

The level of corruption in a country is indicative of that country’s ability to maintain the rule of

law, as well as properly fund and maintain infrastructure, the educational system, and the

economy as a whole-increasing the difficulty of performing transactions, uncertainty regarding

employment, inefficient allocation of resources, etc.(Investopedia) In the case of international

soccer, the main governing body of soccer, FIFA has faced multiple of allegations of bribery and

corruption. In 2015, multiple high-ranking FIFA executives were arrested and banned from

soccer, for charges relating to “money laundering, racketeering, wire fraud” (BBC).

Additionally, there is speculation that these same officials have been involved in bribery scandals

regarding the selection of host countries for the World Cup. Thus, it is reasonable to examine if

corruption may have played a part in allowing certain countries to play easier matches and have

better chances at advancing in major tournaments. Further, it is possible that corruption may

have weakened a country’s ability to properly regulate its national team and compete at the

international level. Another facet of corruption is that an inefficient use of funds may have

prohibited certain national teams from both forming or growing earlier, because there was a lack

of access to stadiums, better training, and nutrition. This study will make use of the corruption

index for the years 2002-2018 created by Transparency international, where from 2002 to 2009,

a score of 10 translates to “very clean” and 0 translates to “very corrupt”, and from 2009 onward,

100 translates to “very clean” and 0 to “very corrupt”. Thus, to standardize the values, the index

9

will be scaled by a factor of one-tenth, for the years 2009 onward for a possible comparison for

the previous years of the index. Like the Gini coefficient rankings, on average, many European,

highly ELO-rated countries populate the top 40 spots, whereas many of highly rated South

American teams start to show up around the 100-rank mark. The corruption level is significantly

different, with these South American countries sitting 25-40 (2.5 points to 4 points) below the

ELO-comparable European teams.

V. Summary Statistics

The summary statistics for the variables used in this study are provided below. As mentioned

previously, the popularity and the popularity-income inequality variables will be considered for

the final paper, and thus do not appear in this table.

Table 5.1: Summary Statistics

Variable Mean SD Min Max Observations

ELO Rating 1460 285.2716 603 2150 2658

Real GDP Per Capita 14979.1 21825.64 111.4 179308.1 2100

Urbanization 57.423 23.66352 9.864 100 2144

Education Completion Rates 88.41 18.1 20.46 124.11 1354

Corruption Level 43.37 21.58007 8 97 2065

Income Inequality 34.33 12.41592 13.96 69.99 612

VI. Regression Results

A series of single-variable regressions were run to determine the individual relationships

between ELO rating and the independent variables, in addition to a series of multiple regressions

Through exploration of functional forms, it was discovered that taking the log of some of the

10

independent variable provided useful information regarding its relationship with ELO score.

Thus, the linear-log model will be compared with the standard linear model for some variables.

These single-variable regressions and functional forms will be discussed in the appendix. The

linear-log model will be briefly mentioned in the regression results section, to highlight an

interesting multiple regression that utilizes the linear-log model. To account for the lack of

observations for the Top Income Share variable, a second regression was run, excluding the Top

Income Share variable.

1. Real GDP Per Capita (GDP)

One of the main goals of this study was to investigate the intuitive notion that a richer

country should be able to perform better on the international stage because of the greater

funding a country would be able to provide for a team. However, many rich European

countries consistently rank just as highly as poorer South American countries. Thus, our

expectations regarding the effect of GDP on ELO score are not clear, but we could expect

Years 2002-2016 Score(Dependent) Score(Dependent)

Predictors Estimates CI p Estimates CI p

(Intercept) 1460.27 1354.14 – 1566.41 <0.001 1139.04 1127.60 – 1276.77 <0.001

GDP -0.0002151 -0.001771 – 0.001341 0.786 -0.000248 -0.003509 – 0.001347 <0.001

Education 0.04 0.02 – 0.06 <0.001 0.06 0.05 – 0.08 <0.001

CPI 0.57 -0.92 – 2.06 0.454 0.26 -0.67 – 1.19 0.581

Top Income Share -1.37 -2.39 – -0.36 0.008

Urbanization 2.26 0.75 – 3.58 0.003 5.98 5.36 – 6.60 <0.001

Observations 555 1847

R2 / adjusted R2 0.120 / 0.112 0.274 / 0.272

11

that to an extent an increase in GDP will impact ELO score, until certain other economic

factors overtake the effect of GDP. Per the initial linear regression, GDP negatively

affects ELO score, in that a one dollar increase in real GDP per capita results in a

decrease of 0.0002151 points towards ELO score, on average. This is a noticeably small

impact and was in fact not statistically significant given that the p-value for real GDP per

capita was 0.786, which is quite high. Further, this does not align with our intuition that

real GDP per capita and ELO score are positively correlated. After dropping the Top

Income Share from the model, real GDP per capita was found to have a p-value <0.001,

so that real GDP per capita has a statistically significant effect on ELO score at all the

standard significance levels. However, real GDP per capita still had a negative coefficient

in this model. However, since the 95% confidence intervals for GDP in these multiple

regressions contained 0, we do not find it worthwhile to interpret the direction its effect

on ELO score. However, it is still likely that richer countries still can provide more

facilities, hire better coaches, and utilize other related technological resources to

strengthen their team’s performance. Moreover, it may be so that GDP can’t be such a

strong predictor for success because it acts more as necessary grounds for success to be

feasible. What we mean by this is that if a country lacked the ability to provide these

resources, a national team wouldn’t even have the necessary resources to compete at a

high level.

2. Education Completion Rates(Education)

From the initial regression output, the effect of an increase in education spending per

capita had a positive effect on ELO score, with a coefficient of 0.04, which is consistent

12

with our intuition. Thus, a 1 dollar increase in education spending per capita resulted in a

0.04 increase in a country’s ELO score, on average. The p-value was found to be <0.001,

so that education spending per capita has a statistically significant effect on ELO score, at

all standard significance levels. In the reduced model, the education spending per capita

variable had a similarly positive effect on ELO score, with a coefficient of 0.06 and a p-

value <0.001. Thus, we can conclude that education spending per capita and ELO score

are positively correlated. Therefore, a potential interpretation of this result as that as the

government spends more on each student, there is a better chance that more students will

have early exposure to soccer through school teams/programs, allowing the country to

have a stronger pool of players to draw from. This is a reasonable conclusion because we

can expect that a greater investment in students can allow for better access to the sport.

Additionally, we can interpret this positive correlation as a country is more invested in

the growth of its citizens, it can have a larger pool of motivated and passionate players

who wish to represent their countries on the national soccer stage (potential interpretation

for the popularity variable).

3. Urbanization Rate (Urbanization)

Like real GDP per capita, it seems that a greater level of urbanization implies that more

people have better access to healthcare, jobs, and entertainment (like sports

teams/facilities). Considering the initial linear model, the coefficient for urbanization is

positive at 2.26, which is much consistent with our intuition. This implies that a 1

percentage point increase in the Urbanization rate will increase a country’s ELO score by

2.26 points, on average. Further, since the p-value is 0.003, we can conclude that the

urbanization rate has a statistically significant positive effect on ELO score, at all the

13

standard significance levels. When the top income share variable was dropped, the

urbanization rate variable had a greater positive effect on ELO score, with a coefficient of

5.98. In this model, the p-value was <0.001, so it was even more statistically significant

than in the previous model. Clearly, we can see that there is a positive correlation

between the Urbanization rate and ELO score This is a reasonable conclusion in that as

more people move to cities, there is a greater chance that a country’s soccer team

performs better, potentially due to better access to resources; i.e., if more people live in

cities, then there are more people that have better access to soccer fields and local teams

(people to play with in general in an area with more places to play) and can spend more

time playing soccer. Further, since these people have better access to jobs and healthcare,

it is likely that they are less concerned with finding employment and worrying about their

health and again, have a greater chance to invest time in playing soccer. Therefore, a

team will have a larger pool of talented players to draw from.

4. Corruption(CPI)

In the case of corruption, we expected that a less corrupt would be able to better allocate

resources and organize funding to both create and maintain a national soccer team.

However, we have seen that top-performing European and South American teams have

ranked at the top and bottom of this index, respectively, leading us to an interesting

investigation. Now, considering the initial regression results, we see that CPI has a

positive coefficient of 0.57, which is consistent with our intuition. This implies that a 1-

point increase in the corruption index-which refers to a country becoming “cleaner”- will

increase a country’s ELO score by 0.57 points, on average. Furthermore, the p-value for

CPI is 0.454 and therefore implies that CPI does not have a statistically significant effect

14

on ELO score. In the reduced regression model, the result was similar, where the CPI

coefficient was found to be 0.26 and had a similarly high p-value of 0.581. Therefore, we

can conclude that there is a positive correlation between ELO score and CPI, but the

effect on ELO score is not statistically significant. Thus, we choose to not interpret the

direction of CPI’s effect on ELO score as the 95% confidence intervals for these

estimated regression coefficients contain 0.

5. Income Inequality (Top Income Share)

We were a bit uncertain about the effect of income inequality on ELO score, given that

countries with “very clean” and “very corrupt” corruption ratings both consistently had

high ELO scores. We expected some sort of a positive correlation between ELO score

and top income share. This is like the initial observations for real GDP per capita and the

corruption index ratings. An issue with this variable is the lack of available information,

so there are a small number of observations. Thus, we chose to include an alternate

regression model. From the initial regression model, we can see that the coefficient for

top income share is -1.37, a negative value. This is not entirely consistent with our

intuition. The p-value for Top Income Share was 0.008, which is statistically significant

at all the standard significance levels. Thus, we can conclude that there is a negative

correlation between ELO Score and top income share. Therefore, as the top 1% in each

country increase their share of income, the country will perform worse on the

international soccer stage. Though this did not line up with our intuition, there may some

sense to this finding. As the income in a country becomes more and more concentrated

with the top 1%, fewer people have the time to seriously pursue soccer, because they are

15

spending more time working and trying to make up for a large income gap. However,

further analysis needs to be done to better understand the effects of income concentration

and its interaction with the popularity of soccer, as we are interested in the possibility that

a large income gap might make soccer become a popular avenue for financial stability.

Still, this result may due to a bias in information availability and should be interpreted

cautiously.

VII. Conclusion

In using the reduced model, we saw that GDP, CPI, Urbanization Rate, and Education Spending

Per capita could explain 27.2% of the variation in ELO score, compared to the 11.2% that the

initial regression model could explain. This indicates that it was useful to throw out the top

income share and the need to find a better variable to represent income inequality, as we still

expect there to be an important relationship between ELO score and income inequality.

Extensive further analysis is needed to account for the remaining 73% of variation in ELO scores

that was left unexplained. Given the fact that many abstract forces and factors affect success in

general, it is particularly difficult to pinpoint a national soccer team’s success simply on a

handful of economic growth-related variables. But we still expected that the general economic

make-up of a country would explain the ability of a country to lay the ground for a team’s

success, since having a functioning and well performing economy allows stadiums to be built,

coaches to be hired, and players to be recruited. It is possible that popularity, which will be

explored in the appendix, may account for a country being able to have access to a larger pool of

more passionate and motivated players. Other variables of interest may include those which are

related to climate, mental health, diet, etc. Additionally, it may be worthwhile to consider the

16

effect of population more directly, rather than looking at it through per capita measures, to see if

having more people allows teams to have access to a larger talent pool. There is much left to be

explored for future research given the complexity of success. Given that these variables measure

economic growth over time, it may be more beneficial to focus on the percentage change for

each of these metrics to capture change over time.

VI. Appendix

Real GDP Per Capita

If we examine the plot below, we can see that there isn’t a very strong linear relationship

between real GDP per capita and ELO rating. If anything, it is difficult to surmise any

relationship between these variables given the large clustering of data points.

However, upon examining functional forms, the relationship between log(GDP) and ELO rating

seemed to be more promising, appearing to be somewhat positively linear, as evidenced by the

plot below.

17

Though, this relationship is still not strongly linear, it seemed worth considering. Thus, two

regressions were run.

Years 2002-2016 Score(Dependent) Score(Dependent)

Predictors Estimates CI p Estimates CI p

(Intercept) 1435.41 1420.86 – 1449.97 <0.001 949.14 887.56 – 1010.72 <0.001

GDP 0.002324 0.001771 – 0.002876 <0.001

log(GDP)

61.14 54.04 – 68.25 <0.001

Observations 2101 2101

R2 / adjusted R2 0.031 / 0.031 0.119 / 0.119

In the linear regression, using GDP, as evidenced by the regression table, the coefficient for

GDP was .002324, the presumed positive affect (that is a small effect). Thus, a 1 dollar increase

in GDP per capita resulted an increase of 0.002324 in a country’s ELO score, on average.

Further, the p-value for GDP in this model turned out to be <0.001, which is statistically

significant at all the standard significance levels, implying that its effect on ELO score is

18

significant. Since GDP only explains about 3% of the variation in ELO score, this refutes the

idea that GDP is a strong indicator of variation in ELO score, which we believe is assumed by

most people. In terms of statistical significance, the linear-log regression, using log(GDP), was

consistent with the linear regression with just GDP. The p-value was <0.001, which is

statistically significant at all the typical significance levels, thus indicating that log(GDP) also

has a significant effect on ELO score. The coefficient in this model was positive, 61.14, which is

a much stronger positive effect than GDP’s effect the linear regression. This implies that a 1%

increase in GDP per capita increased a country’s ELO score by 61.148 points, on average. Since

it explained nearly 12% of the variation on its own, it may be more appropriate to consider the

linear-log model for this investigation.

Education Spending Per Capita

In studying the education variable, it was difficult to visually determine the relationship between

education spending per capita and ELO score, given the similar clustering issue which was seen

in the plot for real GDP per capita and ELO score. Again, taking log(Education) showed a

clearer relationship between the variables, a relationship that is like that of log(GDP) and ELO

score.

19

From the regression output, the effect of an increase in education spending per capita had a

positive effect on ELO score, with a coefficient of 0.09, which is consistent with our intuition.

Thus, a 1 dollar increase in education spending per capita resulted in a 0.09 increase in a

country’s ELO score, on average. The p-value was found to be <0.001, so that education

spending per capita has a statistically significant effect on ELO score. However, like real GDP

per capita, education spending per capita only explained a small amount of variation in ELO

score, 9.6%.

Urbanization

Years 2002-2016 Score(Dependent)

Predictors Estimates CI p

(Intercept) 1426.34 1413.44 – 1439.25 <0.001

Education 0.09 0.08 – 0.10 <0.001

Observations 2103

R2 / adjusted R2 0.096 / 0.096

20

In this case, the relationships for log(Urbanization) and Urbanization with ELO score were

similar.

In fact, there seems to be a clearer positive correlation between these variables and ELO score.

Considering the regression output for the linear model, the coefficient for Urbanization is

positive at 5.29, which is much consistent with our intuition. This implies that a 1 percentage

point increase in the Urbanization rate will increase a country’s ELO score by 5.29 points, on

average. Further, since the p-value is <0.001, we can conclude that the Urbanization rate has a

statistically significant positive effect on ELO score. It is worth noting that this is a much higher

positive effect than the previous effects for real GDP and education spending per capita, which

might explain why the Urbanization rate could explain 19.1% of the variation in ELO score.

Years 2002-2016 Score(Dependent) Score(Dependent)

Predictors Estimates CI p Estimates CI p

(Intercept) 1163.65 1135.05 – 1192.26 <0.001 428.63 344.08 – 513.19 <0.001

Urbanization 5.29 4.83 – 5.75 <0.001

log(Urbanization)

263.55 242.27 – 284.83 <0.001

Observations 2145 2145

R2 / adjusted R2 0.191 / 0.191 0.216 / 0.215

21

Though this still indicates that the Urbanization rate is not a strong predictor of variation in ELO

score, it is comparatively larger than the predictive strength of the previously examined

variables. Similarly, the coefficient for the Urbanization rate in the linear-log model was found to

be positive at 263.55, which is quite strong. Since the Urbanization Rate is already a percentage,

this effect may be exaggerated. Further, the p-value was <0.001, indicating that this effect was

indeed significant on ELO score.

Corruption

The relationship between CPI and log (CPI) with ELO score is unclear, visually, but the

log(CPI) plot appears to possible show a bit of positive correlation with ELO score.

Years 2002-2016 Score(Dependent) Score(Dependent)

Predictors Estimates CI p Estimates CI p

(Intercept) 1313.35 1287.14 – 1339.55 <0.001 786.38 698.22 – 874.53 <0.001

CPI 4.19 3.65 – 4.73 <0.001

log(CPI)

194.04 170.11 – 217.98 <0.001

Observations 2066 2066

R2 / adjusted R2 0.100 / 0.100 0.109 / 0.109

22

Considering the initial regression results, we see that CPI has a positive coefficient of 4.19,

which is consistent with our intuition. This implies that a 1-point increase in the corruption

index-which refers to a country becoming “cleaner”- will increase a country’s ELO score by 4.19

points, on average. Furthermore, the p-value for CPI is <0.001 and therefore implies that CPI has

a statistically significant effect on ELO score. CPI only explains 10% of the variation, so it is not

a strong predictor for ELO score. However, we can see from the plot for the linear and linear-log

models that there are many highly corrupt countries that still have high ELO scores. So, this

confirms that CPI effects ELO score, but only to an extent. The linear-log model yielded similar

results with a p-value <0.001, indicating a statistically significant effect on ELO score. It

explained around the same amount of variation in ELO score, so either functional form is

satisfactory. However, we can see that a 1% increase in CPI score resulted an increase of 1.94

points towards a country’s ELO score, on average, which is quite significant.

Income Inequality

It is difficult to assess the relationship between ELO score and Top Income Share, since there are

a larger number of 0’s in the data and there is a relative cluster near middle-high ELO scores and

middle-high Top Income share percentages.

23

From the regression output, we can see that the coefficient for Top Income Share is -1.7821, a

negative value. This is not entirely consistent with our intuition. The p-value for Top Income

Share was <0.001, which is statistically significant at all the standard significance levels.

However, per the R^2 value, Top Income Share is a very weak predictor of ELO score. This

implies that a 1% increase in the top 1%’s share in income will decrease a country’s ELO score

by 1.7821, on average.

Multiple Log Table

Years 2002-2016 Score(Dependent)

Predictors Estimates CI p

(Intercept) 1714.30 1677.93 – 1750.66 <0.001

Top Income Share -1.78 -2.68 – -0.88 <0.001

Observations 612

R2 / adjusted R2 0.024 / 0.022

Years 2002-2016 Score(Dependent) Score(Dependent)

Predictors Estimates CI p Estimates CI p

(Intercept) 367.59 269.24 – 465.94 <0.001 446.52 326.10 – 566.94 <0.001

GDP -0.001943 -0.002996 – -0.00089 <0.001

Education 0.06 0.05 – 0.08 <0.001 0.04 0.03 – 0.05 <0.001

CPI 0.43 -0.48 – 1.33 0.354

log(Urbanization) 279.52 252.72 – 306.32 <0.001 251.29 216.33 – 286.25 <0.001

log(GDP)

7.46 -8.00 – 22.92 0.344

log(CPI)

-7.73 -46.88 – 31.42 0.699

Observations 1847 1847

R2 / adjusted R2 0.294 / 0.292 0.289 / 0.287

24

Miscellaneous Regressions

Years 2002-2016 Score(Dependent) Score(Dependent)

Predictors Estimates CI p Estimates CI p

(Intercept) 1153.03 1072.36 – 1233.70 <0.001 1184.38 1116.20 – 1252.56 <0.001

GDP 0.002485 0.001181 – 0.003788 <0.001 -0.00 -0.001138 – 0.002259 0.190

Education 2.81 1.88 – 3.73 <0.001 -0.69 -1.60 – 0.22 0.136

CPI 1.56 0.26 – 2.85 0.019

Urbanization

6.36 5.60 – 7.11 <0.001

Observations 1222 1340

R2 / adjusted R2 0.133 / 0.131 0.228 / 0.226

These regressions were included to further display some additional reduced models. It is worth

noting that the leftmost regression is the only notable (including at least 3 independent variables)

multiple regression in which GDP was found to have the predicted positive coefficient and have

a statistically significant effect on ELO score.

Years 2002-2016 Score(Dependent) Score(Dependent)

Predictors Estimates CI p Estimates CI p

(Intercept) 694.06 285.35 – 1102.76 0.001 1202.19 1127.60 – 1276.77 <0.001

GDP 0.0005159 -0.001135 – 0.002167 0.539 0.0002445 -0.0009924 – 0.001481 0.698

Education 5.62 1.47 – 9.76 0.008 -1.16 -2.16 – -0.16 0.023

CPI 3.33 1.27 – 5.39 0.002 0.19 -1.02 – 1.40 0.754

Top Income Share -0.34 -1.63 – 0.96 0.612

Urbanization 2.64 0.77 – 4.50 0.006 6.76 5.86 – 7.65 <0.001

Observations 321 1222

R2 / adjusted R2 0.214 / 0.202 0.265 / 0.263

25

These regressions include a newly defined version of the Education variable which is now

measured by primary completion rate, “the number of new entrants (enrollments minus

repeaters) in the last grade of primary education, regardless of age, divided by the population at

the entrance age for the last grade of primary education” (World Bank) for the years 2002-2016.

This variable is not as highly correlated with GDP as is education spending per capita, but the

signs of the variables were not consistent across the individual reduced models.

Popularity

We chose to include this variable in the appendix as it was of significant interest to our study. As

proposed earlier, it is likely that the more popular soccer is in a particular country, said country

will have a higher ELO score.

From the scatter plot above, we can see that the relationship between ELO score and popularity

is relatively positive, but highly clustered between 0 and 20. Since this data set has meaningful

0’s, we are unable to investigate the log form of the variable, which may lead to a clearer

26

depiction of the relationship between the two variables. Nevertheless, we perform a single linear

regression between ELO score and popularity, as seen below.

The single variable regression for popularity indicates that it has a statistically significant

positive effect on ELO score, with a coefficient of 10.39, and p-value <0.001. This coefficient is

the largest compared to any of the non-log, single variable regressions, but only explains around

13% of the variation for ELO score. Thus, this is consistent with our hypothesis that popularity

has a positive effect on ELO Score. popularity has on ELO score.

Score(Dependent)

Predictors Estimates CI p

(Intercept) 1321.93 1303.24 – 1340.63 <0.001

Popularity 10.39 9.25 – 11.53 <0.001

Observations 2169

R2 / adjusted R2 0.129 / 0.129

27

We then ran a multiple regression with similar full and reduced models as seen in the previous

multiple regressions. In the full model, the only variables with statistically significant effects on

ELO score were Urbanization, Education and Popularity, with each of these variables having the

predicted positive effects. Further, popularity had the highest magnitude regarding its effect,

which is consistent with the non-log, single variable regressions. We once again chose to remove

Top Income Share to increase the number of observations, which resulted in all variables except

CPI, having statistically significant effects on ELO score, which is consistent with the previous

multiple regressions. More importantly, this regression was able to explain nearly 35% of the

variation in ELO score, which is roughly 10% more than when Popularity was not included in

Score Score

Predictors Estimates CI p Estimates CI p

(Intercept) 1248.49 1136.89 – 1360.09 <0.001 1054.09 1014.99 – 1093.20 <0.001

GDP -0.000735 -0.00225 – 0.000785 0.343 -0.002214 -0.00328 – -0.00114 <0.001

Education 0.03 0.01 – 0.05 0.007 0.05 0.03 – 0.07 <0.001

CPI 1.29 -0.21 – 2.79 0.092 0.33 -0.61 – 1.28 0.491

Top Income Share 0.29 -0.75 – 1.33 0.585

Urbanization 2.38 0.91 – 3.85 0.002 5.53 4.92 – 6.14 <0.001

Popularity 8.98 7.15 – 10.81 <0.001 8.24 7.12 – 9.35 <0.001

Observations 493 1680

R2 / adjusted R2 0.269 / 0.260 0.351 / 0.349

28

the model. The coefficients on Education, Urbanization, and Popularity were all positive, which

lined up with our intuition. However, the coefficient for GDP remained negative, which is not in

line with our intuition. This may be an issue due to multicollinearity, so the regression was re-

run, using the newly defined version of the Education variable, using the metric from the

miscellaneous regression section.

Score(Dependent) Score(Dependent)

Predictors Estimates CI p Estimates CI p

(Intercept) 686.97 265.86 – 1108.07 0.002 1106.80 1030.32 – 1183.28 <0.001

GDP -0.000531 -0.00 – 0.00 0.530 0.00002841 -0.00 – 0.00 0.964

Education 3.22 -0.94 – 7.37 0.130 -0.88 -1.88 – 0.13 0.087

CPI 4.50 2.41 – 6.59 <0.001 0.08 -1.16 – 1.31 0.905

Top Income Share 2.20 0.79 – 3.62 0.003

Urbanization 2.26 0.39 – 4.13 0.019 6.05 5.15 – 6.94 <0.001

Popularity 9.56 7.13 – 11.99 <0.001 8.36 6.95 – 9.76 <0.001

Observations 280 1103

R2 / adjusted R2 0.344 / 0.330 0.337 / 0.333

It appears that redefining the education variable did little to fix the issue as GDP was not

statistically significant in either regression, even though the coefficient changed signs, since the

confidence intervals for GDP contained 0. However, these models were able to explain nearly

33% of the variation in ELO score, which is again, comparatively high. Thus, it seems

reasonable to conclude that GDP may not have the expected positive effect since GDP serves

only as a prerequisite for success-the country must me able to effectively use the resources that

its wealth grants.

29

VII. Data Sources

Real GDP Per Capita: The CIA World Fact Book

https://www.cia.gov/library/publications/the-world-factbook/rankorder/2001rank.html

https://data.worldbank.org/indicator/NY.GDP.PCAP.CD

Gini Coefficients: The CIA World Fact Book , Top Income Index

https://www.cia.gov/library/publications/the-world-factbook/rankorder/2172rank.html

https://wid.world/data/#countriestimeseries/gptinc_p0p100_992_j/US;FR;DE;CN;ZA;GB;WO/1

930/2017/eu/k/p/yearly/g

Corruption Index: Transparency International

https://www.transparency.org/research/cpi

Education: The CIA World Fact Book, OECD

https://data.oecd.org/eduatt/population-with-tertiary-education.htm#indicator-chart

https://data.worldbank.org/indicator/SE.PRM.CMPT.ZS?view=chart

ELO Ratings: World Football ELO Ratings- An in depth look at the formula used and the

weighting of matches and goal differences can be found at the official website for ELO ratings

https://www.eloratings.net/about

https://www.fifa.com/fifa-world-ranking/procedure/men.html

(For comparison with the ELO ratings system)

Urbanization: World Bank

30

https://data.worldbank.org/indicator/SP.URB.TOTL.IN.ZS?end=2017&start=1960&year_low_d

esc=true

https://www.lorecentral.org/2018/01/advantages-and-disadvantages-of-urbanisation.html

Popularity: “The Socio-Economic Determinants of International Soccer Success” (Hoffman et.

al)

https://trends.google.com/trends

FIFA Scandal: BBC

https://www.bbc.com/news/world-europe-32897066

Corruption Information: Investopedia

https://www.investopedia.com/articles/investing/012215/how-corruption-affects-emerging-

economies.asp

World Cup Title Information: World of Soccer

http://www.aworldofsoccer.com/top_tournaments/world_cup_winnersbycontinent.htm

31

References

1. 12 Advantages and Disadvantages of Urbanisation. LORECENTRAL, 12 Jan. 2018.

2. “COUNTRY COMPARISON: DISTRIBUTION OF FAMILY INCOME - GINI INDEX.” World

Fact Book, Central Intelligence Agency.

3. “COUNTRY COMPARISON: GDP (PURCHASING POWER PARITY).” World Fact Book,

Central Intelligence Agency.

4. “Education Resources - Education Spending - OECD Data.” The OECD Database, OECD.

5. “FIFA Trends Data.” Google Trends, Google.

6. “FIFA Corruption Crisis: Key Questions Answered.” BBC News, BBC, 21 Dec. 2015.

7. “Government Expenditure on Education, Total (% of GDP).” The World Bank Database, World

Bank.

8. Hoffman, Robert, et al. “The Socio-Economic Determinants of International Soccer

Performance.” Journal of Applied Economics, V, no. 2, Nov. 2012.

9. “Individuals Using the Internet (% of Population).” The World Bank Database, World Bank.

10. Mirzayev, Elvin. The Economic and Social Effects of Corruption. Investopedia, 1 Aug. 2018.

11. “‘Population, Total.’” The World Bank Database , World Bank.

12. Research - CPI - Overview, Transparency International.

13. Top Soccer Tournaments: Soccer World Cup Winners by Continent, A World of Soccer.

14. “Urban Population (% of Total).” The World Bank Database , World Bank.

15. World Football ELO Ratings 2002-2018.