India is a linguistically diverse country with a large flow of internal migrants. This article demonstrates that the greater the “linguistic distance” between a woman’s native language and the dominant language of the district in which she resides, the worse are her health outcomes. Further, linguistic distance also has a negative impact on access to healthcare services for children.
Language plays a foundational role in shaping social interactions and enabling access to economic opportunities through education, employment, and access to public services. However, languages differ widely in terms of structure and composition, even within the same geographic areas, and these differences can create barriers to communication and integration. These barriers are significant for migrants, in particular, as their native language often varies from the dominant language of the place in which they reside.
The potential returns to an investment in learning a new language are high (Chiswick 2008, Ginsburgh and Weber 2020). However, learning a new language is also costly, and the cost of acquiring a new language for an individual depends, in part, on how linguistically distinct the new language is from an individual’s native tongue, that is, how big the language barrier is. For migrants, the more distinct their native tongue is from the dominant language of the region in which they reside, the greater the costs they face in achieving socioeconomic integration.
India has significant linguistic diversity with 22 official languages and thousands of dialects. Proficiency in the local dominant language enables people to get access to education, healthcare and the labour market (Laitin and Ramachandran 2016). At the same time, there are a large number of internal migrants (temporary and seasonal) within the country. Census data from 2011 indicates that the number of rural-urban migrants is 51 million. Given this extent of linguistic diversity, it is likely that a migrant’s mother tongue is different from the dominant language of the region, imposing costs on migrants in terms of human capital outcomes.
In our study (Jayakumar and Sharma 2025), we examine the consequences of linguistic barriers on access to healthcare and, consequently, on health outcomes. In particular, we estimate the effect of the increase in the cost of acquiring a local dominant language on observed health outcomes and health-seeking behaviour. To quantify the costs of acquiring a language, we use a measure of linguistic distance (LD) developed by Fearon (2003).
Measuring linguistic distance
The cost to a native speaker of language A to learn another language B is directly related to the LD or distinctness between the two languages. For example, the LD between languages like Tamil and Kannada is low, as they share a similar structure, which reduces the cost for a speaker of one language to learn the other. On the other hand, the LD between Tamil and Nepali is high, since these are two very distinct languages, and the cost of learning one language by the native speaker of the other is high. Measures of LD rely on ‘language trees’, which classify and group languages based on ancestry, origin, and structure, among other parameters. One such language tree is the Ethnologue (Lewis et al. 2014), which shows the relationship between different languages, specifically how languages evolved from common ancestors and split over time into different branches. The Fearon (2003) method that we use measures LD by comparing how many nodes two languages share on their classification trees, relative to the average depth of those trees; the more shared ancestry, the smaller the distance. It then transforms this similarity into a distance between 0 (same language) and 1 (completely unrelated).
Figure 1. Language tree
Data and empirical strategy
To test the hypothesis that increasing LD is correlated with worsened health outcomes, we use pooled survey data from the two most recent rounds of the National Family Health Survey conducted in 2015-16 (round 4) and 2019-21 (round 5). The pooled dataset includes 1,415,675 unique women between the ages of 15 and 49 years. We focus on health outcomes that are likely to be responsive to receiving information on prevention and treatment from health services (for example, anaemia and high blood sugar) and also look at the relationship between LD and health investments in children, specifically under-five immunisation rates of approximately 130,000–370,000 children. The main source of variation in our data is LD between the respondent’s mother tongue and the dominant language of the district. We identify the dominant language of the district from the 2011 Census as the language spoken by the most individuals in the district. Twenty-one official languages are recorded for both the respondent’s mother tongue and the dominant language of the district. Thus, LD using the Fearon (2003) method is calculated between all pair-wise combinations of languages to account for all possible mismatches between the respondent’s mother tongue and the dominant language. We then estimate the effect of LD between the dominant language and mother tongue on the incidence of poorer health outcomes among women and their children. Our specification accounts for a range of household and individual characteristics and district-level differences, and we compare similar households that vary only in their LD from the dominant language of the region. This is particularly important for identification of a causal relationship since our analysis effectively compares migrants to non-migrants.
Findings
We find evidence that increasing the LD between a woman’s native language and the dominant language of the district she resides in results in poorer health outcomes for her. A one-unit increase in LD leads to a 0.9-1.3 percentage point (p.p.) increase in the probability of being anaemic or having high blood sugar levels. We also find evidence of reduced access to healthcare services for the children of these migrants: again, a one-unit increase in LD is associated with a reduced probability of receiving routine vaccinations such as DPT (Diphtheria, Pertussis, and Tetanus), Polio, Measles, Pentavalent, Rotavirus, Hepatitis B, Vitamin A1, and Vitamin A2 supplementation by 3-6.9 p.p.
In terms of mechanisms, we consider outcomes that capture engagement with the healthcare system. We look at whether a respondent has met with healthcare workers in the past three months, whether she has access to health insurance, whether she was informed about how to deal with the side effects of a medical procedure (sterilisation), and whether she thinks the care she received after this medical procedure was adequate. Our results suggest that women facing linguistic barriers engage less with the healthcare system and have less satisfying outcomes when they do.
A second channel is exposure to information about public health through the media. We estimate the impact of growing LD on whether a woman has heard of family planning methods through radio, television, and newspapers. Since the language of communication for these forms of media is primarily in the dominant language, our findings provide some suggestive evidence that women who do not speak this language well are less likely to access this information.
A third channel is differences in individual autonomy of women since families might impose greater restrictions on the mobility of women who are less familiar with the local language than women who speak the native language of a district. We consider multiple measures of autonomy. As LD increases by one unit, we find that the probability of the woman being allowed to get medical help for herself alone or even with someone declines by 0.8 and 1.8 p.p., respectively. More generally, women’s mobility declines as LD increases: with a one unit increase in LD, women are 6.4 p.p. less likely to be allowed to go to a medical facility alone, 4.4 p.p. less likely to be allowed to go outside the village alone, and 6.9 p.p. less likely to be allowed to go to the market alone. Taken together, our results provide evidence that LD presents a barrier to accessing healthcare on multiple fronts.
Additionally, we explore heterogeneous effects by household characteristics such as wealth and length of residence in a district. We find that increasing household wealth and the number of years a household resides in a district moderate the negative effects of LD on the health outcomes of children. This suggests that the impact of LD falls more heavily on the poor than the rich. Public healthcare services, therefore, need to play a more active role in addressing linguistic gaps since the poor are more likely to use subsidised public services.
Finally, we consider heterogeneous effects by the ethnolinguistic diversity of a district and whether the average income in the state is above or below the median for the country. We find that the adverse effects of LD are mitigated for individuals living in relatively more linguistically diverse districts, compared to less linguistically diverse districts. One reason for this could be that districts with higher linguistic diversity are better equipped to deal with language barriers in the healthcare system, leading to better health outcomes among children. Similarly, we find that the adverse effect of linguistic distance on children’s vaccine uptake is significantly larger in low-income states. In sum, states with higher incomes and higher capacity to manage public services are better able to deliver some types of healthcare to people who face linguistic barriers.
Conclusion
Our research contributes to the literature on the adverse effects of LD on human capital accumulation, specifically through access to the healthcare system. Language acquisition is a critical channel for social, economic, and cultural mobility. The design of inclusive policies that allow linguistically diverse groups of people to more fully engage with the State can play a pivotal role in reducing disparities in health and other socioeconomic indicators of development. Our results suggest policymakers ought to make greater efforts to include linguistically diverse groups in their programmes for healthcare outreach to improve their health outcomes. Alternatively, enabling individuals to more easily learn dominant languages could also reduce the costs of acquiring new languages and mitigate the adverse effects of LD. This is especially important in linguistically diverse countries like India, with large flows of internal migrants.
Further Reading
- Chiswick, RB (2008), ‘The Economics of Language: An Introduction and Overview’, IZA Discussion Paper 3568.
- Ginsburgh, Victor and Shlomo Weber (2020), “The economics of language,” Journal of Economic Literature, 58 (2): 348-404. Available here.
- Laitin, David D and Rajesh Ramachandran (2016), “Language policy and human development,” American Political Science Review, 110 (3): 457-480. Available here.
- Jayakumar, A and A Sharma (2025), ‘Is language a bridge or a barrier? Impact of linguistic distance on the health of women and children’, Ashoka University Economics Discussion Paper No. 150.
- Fearon, James D (2023), “Ethnic and cultural diversity by country,” Journal of Economic Growth, 8: 195-222. Available here.
- Lewis, MP, G Simons and CD Fennig (2014), Ethnologue: Languages of the World, SIL International, Dallas, Texas.




10 November, 2025 






Comments will be held for moderation. Your contact information will not be made public.