Productivity & Innovation

India’s data workers: The human labour making machines learn

  • Blog Post Date 11 September, 2025
  • Articles
  • Print Page
Author Image

Neha Arya

Indian Institute of Technology Delhi

huz208211@hss.iitd.ac.in

Technological progress can displace workers from existing work, as well as create new work. Combined with demographic changes and macroeconomic fluctuations, it has also spurred the growth of non-standard employment. Within these trends, Neha Arya shines the spotlight on “data workers” in India’s gig and platform economy. Noting the vulnerability of human labour behind artificial intelligence (AI) systems, she deliberates on the way forward for India as a major player in global AI supply chains.

Technological advancements reshape models of interaction among individuals, firms, governments, and across these groups. These changes often influence the fundamental nature of work, along with several dynamics of the labour market. Lin (2011) used historical US Census data for the period 1965-2000 (that used the Dictionary of Occupation Titles) to show the novel job titles it captured, for example, “web developer”, “chat room host”, and “radiopharmacist”. Using Lin’s approach, Autor et al. (2021) found that over 60% of employment in 2018 was under job titles that did not exist in 1940. However, while technology, such as automation, can displace workers from existing jobs or tasks, it also creates new work, reinstating demand for workers with specific expertise (Acemoglu and Restrepo 2018). Identifying these new occupational titles is especially crucial for developing countries like India, which have a large, informal and vulnerable workforce and are experiencing rapid technological advancements.

Technological innovations, demographic changes, and macroeconomic fluctuations have enabled proliferation of several forms of non-standard employment (NSE) worldwide. These include part-time/on-call work, temporary agency work/multi-party employment arrangements, disguised employment/dependent self-employment1 (International Labour Organization (ILO), 2016). The Covid-19 pandemic further accelerated this trend. Developing countries, marked by a high degree of casual employment, have also experienced this change in the nature of employment. A 2019 global survey found that 80% of the respondents preferred flexible work opportunities, and 65% of businesses reported cost-efficiency gains of such flexibilisation (International Workplace Group (IWG), 2019). India’s flexi-staffing industry rose by 15.3% during 2023-24 driven mainly by FMCG (fast-moving consumer goods), e-commerce, manufacturing, healthcare, retail, logistics, banking, and energy sectors (Indian Staffing Federation (ISF), 2024). This surge raises concerns about increasing informality and vulnerable employment, especially amid the rapid expansion of the gig and platform economy.

The human labour behind artificial intelligence

While broad issues around India’s gig and platform economy have gained prominence, the emerging category of “data workers” (new work that is vital for Artificial Intelligence (AI) systems) remains largely overlooked in the discourse. Since the term AI was coined by a group of computer scientists (McCarthy, Minsky, Rochester, and Shannon) in 1956, it has evoked a mix of hope, fear, and uncertainty for the future of work. After several efforts, the ongoing AI revolution is now observing a global race for AI leadership. Generative AI (GenAI) models are swiftly becoming popular, like OpenAI’s ChatGPT,- which became the fastest-growing web application in history with 100 million monthly active users within 2.5 months and the attainment of 500 million users in a short span of time (Hadi and Najm 2023, Paris 2025). The term “generative” underscores the fact that these AI systems can create or generate new material autonomously without human input (Feuerriegel et al. 2023). However, a huge amount of human labour goes into development of these AI systems. Many of these AI systems (including ChatGPT, Google’s Gemini, DALL-E, among others) are based on a complex “human-in-the loop” (HITL) model (Rani and Dhir 2024). HITL uses the judgement of human data workers for annotation, labelling, and categorising raw data (like text files, images or videos) to train machine learning models (IBM, 2025). Data curators, labellers, content moderators, validators, and human feedback providers work to ensure that AI does not perform poorly or dangerously (for instance, in autonomous cars). Accuracy of these data is crucial for efficiency and better predictability/performance of AI models. Thus, data workers are the backbone of AI systems, ensuring their functionality, accuracy, and safety – ironically while themselves working in precarious, fragmented, and often invisible conditions.

Why is this an important contemporary and future concern for India? To meet cost-efficiency goals, businesses increasingly rely on gig workers in the AI supply chain – often outsourcing tasks to crowd workers via digital labour platforms (DLPs) or smaller firms employing data workers. For instance, at the announcement of Amazon’s Mechanical Turk or MTurk (a virtual labour marketplace/crowdwork platform) in 2006, Amazon’s CEO Jeff Bezos referred to it as “artificial artificial intelligence”. It signified that the “Human Intelligence Tasks” (HITs) available on the platform were microtasks (often simple and repetitive) to be performed by a reserve army of cheap labour. A prominent outcome of data workers using MTurk (called ‘turkers’) for a project (by Jia Deng et al.), was the release of ImageNet dataset, the largest labelled image dataset, in 2009. It was fuelled by the work of millions of workers across the globe, who manually labelled a million images for very low wages. A 2016 survey by the Pew Research Center, of almost 3,000 turkers from the US revealed that over 50% of all workers reported hourly earnings below $5 (Pew, 2016). Unsurprisingly, a large majority of data workers are in the Global South, where wages are significantly lower. For instance, in Kenya, data workers mostly receive hourly wages of only US$2, while in Argentina hourly wages go as low as US$1.7. In addition, workers are often bound by Non-Disclosure Agreements (NDAs) by companies, further invisibilising their contributions to AI systems (Dachwitz 2024). Besides low wages, concerns have also been raised about adverse mental health outcomes for data workers engaged in content moderation. Content moderators are regularly exposed to traumatising content, which has long-term psychological implications – sometimes even leading to drug dependency (Gebrekidan 2025).

India’s role in global AI supply chains

According to the European Commission, India registered one of the fastest rates of digitalisation (11%) during the period 2011-2019 – similar to China – making the National Industrial Classification (NIC) (2008) used by labour surveys too dated to capture most digitally-driven new work. Where then, were gig, platform, and data workers captured? India’s annual Periodic Labour Force Survey (PLFS) uses the National Classification of Occupations (NCO) (2015), in which “data entry clerks” are captured by “Family 4132” (Figure 1). Essentially, the categories include traditional clerical data input roles and do not explicitly cover modern AI-related data work. Overall, therefore, these workers remain statistically invisible, as is the case for digital platform-based gig workers.    

Platform gig and data work in AI supply chains are key present-day illustrations of the “reinstatement effect” (Acemoglu and Restrepo 2019) of technology. Job advertisements (see Figures 2, 3, and 4 below) for roles involving data work, like data validation and data annotation, list key competencies including data analysis, excellent written and verbal communication skills, attention to detail, among others. Figure 5 illustrates the rising demand for data workers in India (now emerging as a key hub for data annotation), powered by a diverse workforce, producing high-quality datasets for global use. In 2024, an estimated 50,000 Indian (freelance) annotators were present on international digital platforms, and 20,000 full-time annotators within India, according to this Economic Times report (citing data from TeamLease). The same report also states that the global market for data annotations is valued at an estimated US$8.22 billion, and is expected to grow swiftly at nearly 26.2% annually by 2028. From US$250 million in 2020-21, India is expected to service over US$7 billion of the global annotation market by 2030 (National Association of Software and Service Companies (NASSCOM, 2021). Even India’s ‘National Strategy for Artificial Intelligence’ identifies data annotation work as having the potential of “absorbing a large portion of the workforce that may find itself redundant due to increasing automation” (NITI Aayog, 2018). But, besides other issues, a concern is that the HITL model may lead to potential de-skilling of workers performing repetitive tasks to train or improve AI systems. Additionally, while location-based gig work has gained regulatory2 attention through collectivisation efforts (Tiwari 2025, Jain 2025, Elizabeth 2024) – often supported by informal labour unions and widespread public discussions – AI data workers remain largely absent from mainstream discourse. 

Figure 1. Occupations under ‘Family 4132- Data Entry Clerks’ group from NCO-2015

Group Code

Occupation Title

Description

4132.0401

Data Entry Machine Operator

Enters alphabetic, numeric, or symbolic data into computer, and verifies it.

4132.0402

Domestic Data Entry Operator

Electronically enters data (daily/ hourly work reports) on client or office sites.

4132.0600

Coding Machine Operator

Handles coding machines to print codes on different materials.

4132.0800

Duplicating Machine Operator/Photocopier

Operates and monitors photocopying machines.

4132.0900

Embossing Machine Operator

Operates power driven embossing machines.

4132.1000

Addressing Machine Operator

Operates electrically-driven printing machines.

4132.1300

Book Keeping Machine Operator

Records business transactions using computer softwares, and performs general clerical duties.

4132.1400

Bill Processing Clerk

Prepares bills, statements, calculates payrolls and other amounts, using computer software.

4132.9900

Data Entry Clerks, Other

Operates book-keeping and computing machines not elsewhere classified

Figure 2. Data labelling – permanent                


Figure 3. Data annotator – freelance 

 
  

Figure 4. Classification data annotation – freelance


Figure 5. Demand for data workers

Policy directions

Although India ranks 14th in AI research globally, with a share of 1.4% during 2018-2023, compared to US’s share of 30.4% and China’s share of 22.8%. However, it has already come into focus as a global market for AI technologies – recently emerging as the second largest, and among the fastest growing markets globally for ChatGPT. As the future implications of the ongoing AI revolution remain obscure for all, India stands at a pivotal moment to shape its role in the global AI supply chain. To fully leverage AI’s economic and (decent) employment potential, a coordinated policy approach is needed. While a national AI strategy lays down a blueprint, updating NCO to encompass AI data-related jobs (including crowdsourced microtask work), establishing AI-focused skill development hubs, regulating gig work in the AI supply chain, and promoting AI-related research and development in an equitable and inclusive manner, are crucial. There is a need to identify such gig work via digital labour registries, promote the upskilling of workers, and ensure the accountability of platforms throughout the chain. The uncertain AI era needs proactive measures to avoid continued polarisation3 of skills and jobs (Kuriakose and Iyer 2020) in India’s labour market. Moreover, declining labour share of income (Karabarbounis and Neiman 2013) owing to technological advancements and popularity of work fragmentation, need immediate regulatory, civil society, and legislative responses. Building a resilient, ethical AI workforce requires both innovation and inclusion. As a ‘hub’ for AI supply chain labour, India has an opportunity as well as a responsibility to improve labour market conditions for these data workers, who must not be disconnected from the wider benefits they generate. 

Notes:

  1. “Disguised employment” refers to arrangements where workers provide their labour while having contractual arrangements corresponding to self-employment.
  2. “Dependent self-employment” applies to persons who operate a business without employees but do not have complete control over their work. 
  3. Rajasthan Platform Based Gig Workers (Registration and Welfare) Act, 2023; Karnataka Platform-Based Gig Workers (Social Security and Welfare) Act, 2025; Bihar Platform Based Gig Workers (Registration, Social Security and Welfare) Act, 2025.
  4. Job polarisation refers to the shrinking share of middle-skill jobs (typically involving routine tasks), while both high-skill and low-skill jobs grow within the economy.

Further Reading

Tags:
No comments yet
Join the conversation
Captcha Captcha Reload

Comments will be held for moderation. Your contact information will not be made public.

Related content

Sign up to our newsletter