In a previous blog post I estimated the density of Esperanto speakers in each country of the world. The density is the number of esperanto speakers per 1 million inhabitants. In this post I examine how the density of Esperanto speakers can be explained by other variables. The variables are
- Density of esperanto speakers
- Linguistic diversity
- Similarity between Esperanto and the languages of the country
- Density of internet users
- GDP per capita
- The rankings of the seven categories of the Good Country Index:
- Contribution to science
- Contribution to culture
- Contribution to peace
- Contribution to climate
- Contribution to order
- Contribution to equality
- Contribution to health
By separately comparing each of the blue variables to the density of Esperanto speakers, we get that most variables are significantly correlated.
The plot shows that, on average, countries with low language diversity has a higher density of Esperanto speakers, that countries where the population speaks languages similar to Esperanto have more Esperanto speakers and so on. All of these are just correlations – what are the causal effects? What if the real cause is just belonging to the Western World? Living in the Western World is correlated with low language diversity, good health care, a lot of internet etc. That could explain our observed correlations.
One way of getting closer to finding the real causation is to include many variables in the analysis. The strong relationship between the real causation and the variable of interest can make the non-causal correlations disappear. The results of that is shown in the plot below
In the following I will go through each variable and the conclusions.
The language diversity is an estimate of the probability that two people from a country has the same mother tongue. It is also called the Linguistic Diversity Index (LDI). This analysis says that low language diversity does not cause significantly more Esperanto speakers. It is a common belief that high language diversity gives more Esperanto speakers, which is unlikely to be true because the T-statistic is so far from positive.
Similarity to Esperanto
I constructed this variable. For each country I collected the proportions of each spoken language. For example, in Bosnia-Herzegovina 52.9% speak Bosnian, 30.8% speak Serbo-Croatian, 14.6% speak Croatian and the rest 1.7% speak other languages. For each language, I calculated the distance to Esperanto using the method described in my previous post (summary; it is the average phonetic distance between some words in the languages). The distance between Esperanto and Bosnian is 0.754, Esperanto and Serbo-Croatian is 0.797 and Esperanto and Croatian is 0.772. Hence, the distance between Bosnia-Herzegovina and Esperanto is the weighted average
This variable is not significant for the number of Esperanto speakers. It has been argued that the similarity between the European languages and Esperanto is problematic because people who doesn’t speak a European languages would find Esperanto harder. This analysis finds no effect of that in the data.
Percentage of population on the internet
An internet user is defined as someone who has used the internet in the past 3 months. The data is downloaded from the World Bank. This variable is just significant for the number of Esperanto speakers.
GDP per capita
GDP is the value of all products and services produced in a country. The data is downloaded from the World Bank. This variable is just non-significant for the number of Esperanto speakers.
The Good Country Index
The Good Country Index is a collection of 7 measures of how different countries contribute to the world in different areas. It is intended to challenge the importance GDP when countries make decisions.
- This is a combined measure of how many international students, journals, publications, patents and nobel prizes that a country has. The better a country performs in this regard, the more Esperanto speakers it has. Learning Esperanto is an academic discipline, so the relationship is not surprising.
- This is a combined measures of the export of creative goods and services(that is art, books, digital content etc.), visa restrictions, UNESCO participation and press freedom. A high value in this measure significantly increases the number of Esperanto speakers in a country.
Peace, order, climate, equality and health
- These factors are the remaining 5 measures from the good country index. I will not go through them because they are not significantly related to the number of Esperanto speakers in a country.
The variable continent was added to the analysis to adjust for history.
To conclude, there are more Esperanto speakers in countries which are rich, have internet and contribute a lot to science and culture. A high linguistic diversity in a country has no – or perhaps a slightly reductive – effect on the number of Esperanto speakers. Similarity to Esperanto also does not seem to affect the number of Esperanto speakers.
I compared the predicted density of Esperanto speakers with the ‘actual’ density.
Below I list the extreme, colored points.
The formula for the expected number of esperanto speakers in Great Britain is
The base number for a country with the population of Great Britain is . This number is then multiplied with the multipliers from all the other factors, .
|Distance to esperanto(0.73)||1.1248097|
|GDP per capita(43876)||1.5219321|
The significant factors, internet, GDP, continent, science and culture, all contribute with high numbers to the expected number of Esperanto speakers for Great Britain.
The model is a linear regression
where the 12 variables, , are listed in the beginning of the blog post and is the per capita number of Esperanto speakers. 37 Countries with missing data were removed from the analysis. I did not impute missing data because it changed the results drastically. The amount of multicollinearity is insignificant because the GVIF*(1/2*df) is in the range 1.1-2.5. The value is 0.7166, so there is still a good amount of variance left in the data as seen from the residuals. The code and most of the data can be found in my gist repository.