In October 2012, India embarked upon its ‘Open Government Data’ journey, by opening up access to government-owned shareable data in machine-readable formats for the use of general public. In this article, Natasha Agarwal, an independent research economist, discusses issues in the design and implementation of the initiative particularly through the lens of its governing policy - the National Data Sharing and Accessibility Policy, and makes recommendations to enhance its effectiveness in achieving stated objectives.
From weather data and State-produced texts to traffic studies and scientific information, government is the host of invaluable publicly-generated data. These data have the potential of unleashing entrepreneurship, innovation, and scientific discovery - all of which can generate jobs, improve citizens’ lives, promote effective and efficient governance, and consequently spur economic growth. However, these direct and indirect benefits from opening up access to government data do not materialise automatically.
One of the mechanisms by which these benefits transmit is through research, where making available more and good quality government data can improve and multiply research studies that have a direct and indirect impact on evidence-based policymaking (Agarwal 2015)
. Governments around the world have started to recognise the importance of evidence in policy decisions, and have now started to actively open up their data transparency portals at the national and regional levels1
With the launch of data.gov.in
, India also embarked on its Open Government Data (OGD) journey in October 2012. As part of the OGD movement, India has agreed to provide public access to government-owned shareable data (along with its usage information) in machine-readable formats at no additional cost. The OGD movement in India is governed by National Data Sharing and Accessibility Policy (NDSAP).
Despite the increasing supply of government datasets, the platform is far from opening government doors for uncomfortable scrutiny or creating economic opportunities. It is noticed that datasets are not available. Moreover, the available ones, that is, uploaded datasets are found to be outdated, duplicated, incomplete, lacking in semantic interoperability2, and inadequately referenced. There is an absence of good quality (or any) metadata3 associated with OGD. More often than not, researchers are unable to reach out for effective troubleshooting.
In this article, I evaluate India´s OGD journey through the lens of its governing policy, - the NDSAP. In particular, I highlight the shortcomings in the design and implementation of the NDSAP and provide policy prescriptions to overcome the same. The objective is to help India oil its growth logjam by facilitating greater usage of public data in providing necessary evidence for effective policymaking.
What limits the success of Open Government Data?
In my opinion, the observable slack on the part of ministries, state government departments, subordinate offices, and autonomous bodies of the Government of India (all of these entities are collectively referred to as agencies hereafter), in complying with the NDSAP, is because of the following reasons:
- Suppliers (agents/agencies) do not have a clear understanding of what OGD denotes, and hence are unable to comprehend the economic value that OGD can generate to both upstream and downstream users4 of OGD. Besides, they suffer from resource and capacity constraints to implement OGD.
- Consumers (users) are either unaware of India’s OGD platform5 or prefer to use web portals of agencies because of familiarity and/or ease of access.
- There is limited or no interaction between suppliers and consumers, which highlights the growing wedge in the actual quantity and quality of uploaded datasets on the platform and what is required.
To maximise the economic benefits of the OGD movement, I make two strategic propositions:
1. The first strategic step should focus on the smooth implementation of NDSAP. In this regard, India should bring in more clarity on the objective of the policy. Several steps need to be taken for the same. First, if the objective is to open the doors of the government for uncomfortable scrutiny, then there is a need to end the practice of prioritising datasets into ‘high-value’ and ‘low-value’ categories6
for datasets to be uploaded on data.gov.in
. Given the criteria, there is a tendency that agencies may avoid uploading high-value datasets on data.gov.in
by putting forth arguments in favour of low-value datasets. Besides what the agencies term as high-value, may not necessarily be high-value for the data-user. This will avoid confusion and redirect agencies efforts in uploading all the datasets on data.gov.in
including the ones that are not already available on the web portal of the agencies and that are currently available in formats like PDF/HTML/paper. For datasets that are already on the agencies web portal and not on data.gov.in
, links can be provided on data.gov.in
under relevant sections, to avoid duplication of effort.
If the practice of prioritisation cannot be discontinued, the other option is to collaborate with the agencies and help them prioritise datasets by facilitating the use of information that they collect from citizen participation through various engagement sources such as web searches, which can give information on the number of unsuccessful searches. For instance, government can make it mandatory for agencies to upload those datasets on the OGD platform for which they have received at least 100 unique unsuccessful searches. In addition, like the OGD platform of the UK, the government should use (and publish on the OGD platform) detailed Google Analytics reports on views/downloads of datasets for encouraging popular agencies to improve quality and access of their uploaded datasets on an ongoing basis, and the not-so-popular agencies to then follow in their footsteps.
These measures will not only help in correcting the adverse selection problem7 but also in collecting data to devise ongoing strategies, reinforce scientific rigour, to maximise the economic benefit of the same.
2. The second strategic step should focus on achieving consistency in the implementation of the NDSAP. One of the ways to achieve consistency could be to integrate the e-Governance standards laid down in the National e-Governance Plans (NEGP) with the NDSAP, and make it mandatory for agencies to comply with e-Governance standards for uploading datasets on data.gov.in
. This would ensure that the agencies implement practices that enable standardisation of data and metadata such that precise meaning of information is understood across agencies, within a single agency, and over time. For instance, several uploaded datasets on data.gov.in
are on performance indicators like the GDP (gross domestic product) per capita and the employment levels of the country, a state, district, sub-district, village or a town, over time. However, these datasets operate in silos. Therefore, implementing Metadata and Data Standards (MDDS) would facilitate data interoperability8
The strategy could be strengthened by reinstating the capacity of the agencies by directly involving researchers in the entire process from data collection to data dissemination. This could be achieved by developing internship programmes for researchers, inviting researchers as consultants or having a roster of researchers that could be brought in as and when the need arises. In addition, the government should put in place a dedicated team of researchers for troubleshooting of queries posted on the platform, and have an accessible archive of resolved queries.
The government should also focus on providing both the references and metadata in the same uploaded document on the OGD platform. For instance, Agarwal and Lodefalk (2015)
point out one possibility of converting the OGD on the ‘number of e-visas issued to eligible countries in a given period’ into .xlsx usable format. The same document provides comprehensive metadata that explains the peculiarities in the dataset, and provides live links to the original data source.
None of the above can be achieved if India does not build a data infrastructure which incorporates a balance between physical and intellectual capacity. Investing in computers that have (updated) multiple statistical packages, seamless internet connectivity, hosting of super computers for processing larger datasets, and an engineering team to troubleshoot computer problems - are some of the measures that can help in creation and sustenance of a functioning infrastructure.
Besides, it is essential to reinforce the motivation of the agents/agencies by cutting down on bureaucracy, and encouraging them to innovate. This would enable greater publication of original datasets, and enable the fulfilment of the underlying objectives of the OGD movement more effectively.
In a nutshell, India’s commitment to OGD is to be lauded. However, the effort needs to be re-evaluated to have any substantial impact, as laid out above. The government has to pay more attention to the modalities of the NDSAP and its implementation guidelines. 98% of survey respondents in a recent study said that OGD can have an impact on public policy (Buteau et al. 2015
). Hence, to end India’s growth logjam and improve its position in the world economy, the government has to unleash the potential locked in its OGD movement.
- See http://www.data.gov/open-gov/ for a full list of national and regional open data websites.
- "Semantic interoperability between two datasets is achieved when there is a common understanding of the terms used. If one dataset mentions a certain term with a certain meaning, does the other dataset use this term in the same meaning or for a certain meaning, does this dataset use the same term?” (Colpaert et al. 2014)
- Metadata files facilitate data discovery in terms of the methodologies used for data collection, description of variables in the dataset, and any other peculiarities pertaining to the data.
- Upstream users of OGD are users (mostly policymakers, bureaucrats, and the agents themselves) that assimilate information processed by downstream users of OGD mainly comprising data processors such as application developers, researchers and analysts, amongst others.
- In a survey conducted by Buteau et al. (2015), only 57% of the survey respondents declared knowing about the OGD movement in India of which 25% and 38% considered their understanding of OGD low and average respectively; only 32% assessed their knowledge as satisfactory; and a small 6% judged it extensive. The online survey exercise received responses from 18 professors, 10 Ph.D. or post-doctoral fellows and 36 research fellows and research participants, which summed up to 64 completed surveys.
- According to the Implementation Guidelines, ‘Although each department shall have its own criterion of high-value and low-value datasets, generally high-value data is governed by the following principles: (i) Completeness (ii) Primary (iii) Timeliness (iv) Ease of Physical and Electronic Access (v) Machine readability (vi) Non-discrimination (vii) Use of Commonly Owned Standards (viii) Licensing (ix) Permanence (x) Usage Costs’.
- The agencies have better information on the data they collect than the general public. They may use this extra information to only provide data that they want to, and hence, bypass the Act with ease.
- Metadata and Data Standards (MDDS) are currently available for Person Identification and Land Region Collection (LRC). LRC is a code available for every country, state, district, sub-district, rural land region (revenue village), and urban land region (town).
- Agarwal, N (2015), `Open Government Data: An Answer to India’s Growth Logjam’.
- Agarwal, N and M Lodefalk (2015), ‘Dataset on India´s e-tourist visa (formerly tourist visa on arrival) programme’.
- Buteau, S, A Larquemin and JP Mukhopadhyay (2015), ‘Open data and applied socio-economic research in India: An overview’, Institute of Financial Management and Research.
- Colpaert, P, M Van Compernolle, L De Vocht, A Dimou, M Vander Sande, P Mechant, R Verborgh and E Mannens (2014),‘Quantifying the interoperability of open government datasets’.