Explaining the density of Esperanto speakers with language and politics


In a previous blog post I estimated the density of Esperanto speakers in each country of the world. The density is the number of esperanto speakers per 1 million inhabitants. In this post I examine how the density of Esperanto speakers can be explained by other variables. The variables are

  • Density of esperanto speakers
  • Linguistic diversity
  • Similarity between Esperanto and the languages of the country
  • Density of internet users
  • GDP per capita
  • The rankings of the seven categories of the Good Country Index:
    • Contribution to science
    • Contribution to culture
    • Contribution to peace
    • Contribution to climate
    • Contribution to order
    • Contribution to equality
    • Contribution to health
  • Continent

By separately comparing each of the blue variables to the density of Esperanto speakers, we get that most variables are significantly correlated.

Each variable is compared separately with the density of esperanto speakers and the T-statistics for significance are shown. The red lines mark the 95% significance limits. A positive value of a T-statistic means that the correlation is positive.

The plot shows that, on average, countries with low language diversity has a higher density of Esperanto speakers, that countries where the population speaks languages similar to Esperanto have more Esperanto speakers and so on. All of these are just correlations – what are the causal effects? What if the real cause is just belonging to the Western World? Living in the Western World is correlated with low language diversity, good health care, a lot of internet etc. That could explain our observed correlations.

One way of getting closer to finding the real causation is to include many variables in the analysis. The strong relationship between the real causation and the variable of interest can make the non-causal correlations disappear. The results of that is shown in the plot below

All variables are jointly compared with the density of esperanto speakers in a multiple regression and the T-statistics for significance are shown. The red lines mark the 95% significance limits. A positive value of a T-statistic means that the correlation is positive.

In the following I will go through each variable and the conclusions.

Language diversity

The language diversity is an estimate of the probability that two people from a country has the same mother tongue. It is also called the Linguistic Diversity Index (LDI). This analysis says that low language diversity does not cause significantly more Esperanto speakers. It is a common belief that high language diversity gives more Esperanto speakers, which is unlikely to be true because the T-statistic is so far from positive.

Similarity to Esperanto

I constructed this variable. For each country I collected the proportions of each spoken language. For example, in Bosnia-Herzegovina 52.9% speak Bosnian, 30.8% speak Serbo-Croatian, 14.6% speak Croatian and the rest 1.7% speak other languages. For each language, I calculated the distance to Esperanto using the method described in my previous post (summary; it is the average phonetic distance between some words in the languages). The distance between Esperanto and Bosnian is 0.754, Esperanto and Serbo-Croatian is 0.797 and Esperanto and Croatian is 0.772. Hence, the distance between Bosnia-Herzegovina and Esperanto is the weighted average

\frac{0.529 \cdot 0.754+0.308\cdot 0.797 + 0.146\cdot 0.772}{0.529+0.308+ 0.146}=0.770

This variable is not significant for the number of Esperanto speakers. It has been argued that the similarity between the European languages and Esperanto is problematic because people who doesn’t speak a European languages would find Esperanto harder. This analysis finds no effect of that in the data.

Percentage of population on the internet

An internet user is defined as someone who has used the internet in the past 3 months. The data is downloaded from the World Bank. This variable is just significant for the number of Esperanto speakers.

GDP per capita

GDP is the value of all products and services produced in a country. The data is downloaded from the World Bank. This variable is just non-significant for the number of Esperanto speakers.

The Good Country Index

The Good Country Index is a collection of 7 measures of how different countries contribute to the world in different areas. It is intended to challenge the importance GDP when countries make decisions.

  • Science:

    • This is a combined measure of how many international students, journals, publications, patents and nobel prizes that a country has. The better a country performs in this regard, the more Esperanto speakers it has. Learning Esperanto is an academic discipline, so the relationship is not surprising.
  • Culture

    • This is a combined measures of the export of creative goods and services(that is art, books, digital content etc.), visa restrictions, UNESCO participation and press freedom. A high value in this measure significantly increases the number of Esperanto speakers in a country.
  • Peace, order, climate, equality and health

    • These factors are the remaining 5 measures from the good country index. I will not go through them because they are not significantly related to the number of Esperanto speakers in a country.


The variable continent was added to the analysis to adjust for history.


To conclude, there are more Esperanto speakers in countries which are rich, have internet and contribute a lot to science and culture. A high linguistic diversity in a country has no – or perhaps a slightly reductive – effect on the number of Esperanto speakers. Similarity to Esperanto also does not seem to affect the number of Esperanto speakers.

Additional analysis

I compared the predicted density of Esperanto speakers with the ‘actual’ density.

The green points are countries that has many more Esperanto speakers than expected by the model and opposite with the red points.

Below I list the extreme, colored points.

Expected Actual
Senegal 18.5 99.1
Burundi 20.3 107.3
Benin 14.0 136.0
Nicaragua 23.0 116.5
Togo 9.6 189.2
Mongolia 2.8 100.7
Israel 62.7 423.7
Hungary 440.2 2006.6
Lithuania 93.2 725.6
Bangladesh 130.5 15.5
Mozambique 25.8 3.8
Lesotho 2.4 0.3
India 1950.8 229.3
Guinea 13.6 2.6
Burkina Faso 22.9 4.8
Oman 7.9 1.9
Haiti 60.7 11.1
Guyana 4.1 0.9
Jamaica 27.6 6.2
Great Britain 6465.1 1596.2

The formula for the expected number of esperanto speakers in Great Britain is

\exp(a_0)\exp(a_1x_1)\cdots \exp(a_{12}x_12) \cdot \textup{popsize}

The base number for a country with the population of Great Britain is \exp(a_0) \textup{popsize}=371. This number is then multiplied with the multipliers from all the other factors, \exp(a_i x_i).

Variable Multiplier
Language diversity(0.17) 1.2150130
Distance to esperanto(0.73) 1.1248097
Internet users(0.92) 1.8507034
GDP per capita(43876) 1.5219321
Continent(Europe) 2.1609081
Science(1) 1.7908462
Culture(11) 1.6369619
Peace(64) 1.0080216
Order(11) 0.7633022
Climate(22) 1.1100139
Equality(5) 1.1113126
Health(2) 0.7516319

The significant factors, internet, GDP, continent, science and culture, all contribute with high numbers to the expected number of Esperanto speakers for Great Britain.

Technical notes

The model is a linear regression

log(y)=a_0+a_1 x_1+a_2x_2 +\cdots+a_{12} x_{12}

where the 12 variables, x_1, \dots, x_{12}, are listed in the beginning of the blog post and y is the per capita number of Esperanto speakers. 37 Countries with missing data were removed from the analysis. I did not impute missing data because it changed the results drastically. The amount of multicollinearity is insignificant because the GVIF*(1/2*df) is in the range 1.1-2.5. The R^{2} value is 0.7166, so there is still a good amount of variance left in the data as seen from the residuals. The code and most of the data can be found in my gist repository.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s