Improving survey quality using paradata: Lessons from the India Working Survey

To improve the credibility of survey data, several monitoring tools are used by researchers – such as ‘paradata’, which have gained prominence with the growth of computer-aided interviewing. In this post, Goel et al. discuss how paradata were used in the ‘India Working Survey’ conducted in the states of Karnataka and Rajasthan in 2020, to streamline enumerator practices and enhance data quality.

The efficacy of survey-based policy recommendations primarily hinges on the quality of data collected. Does the survey represent the population it claims to characterise? Are respondents voicing their true opinions? Did enumerator bias creep into the data? Were survey questions administered as intended, and respondents given adequate explanation and time to respond? These are questions that most survey users have but are typically brushed aside in the race to get the analyses out. While there are no foolproof measures to guarantee the authenticity of survey data, there are several tools that researchers have used to monitor and improve their credibility. Traditionally, these have included on-site monitoring by field supervisors, and back checks in the form of in-person re-interviews or call-backs over the phone. More recently, with the growth of computer-aided interviewing (CAI), ‘paradata’ have emerged as an important tool for the quality control of surveys.

Paradata refer to data about the process of data collection. It is that data which accompany any data collection effort, and while it is often not the explicit focus of this effort, it can be effectively used to monitor and improve the quality of the data being collected. Paradata include data on the time of interview, start and end timestamps for each section, total interview duration, revisit information, enumerator characteristics, and enumerator observations¹. In case of computer-aided interviews, they could also include keystrokes, ambient light and sound, GPS (Global Positioning System) coordinates tracking enumerator location and movement, and audio/video recordings of interviews.

In this post, we discuss how paradata were used in the India Working Survey (IWS), a large-scale field-based household survey conducted in Karnataka and Rajasthan in early 2020, to streamline enumerator practices in the field. The ultimate goal was to improve IWS data quality by reducing interviewer-induced measurement error, a particularly important component within the Total Survey Error framework (Olson et al. 2020).

Using paradata concurrent with survey implementation

There exists a rich body of work on the post-facto (that is, after the survey is over) use of paradata to assess non-response (Krueger and West 2014, Krueter and Olson 2013) and measurement error (Da Silva and Skinner 2020). Contrasting this, work on paradata use concurrent with survey implementation is still emerging (Edwards et al. 2017, 2020). Barring some recent work (Bhuiyan and Lackie 2016, Choumert-Nkolo et al. 2019; Finn and Ranchhod 2015), most illustrations of paradata use are from developed countries. Even though the underlying statistical theory is portable across contexts, operationalisation issues in less developed countries are very different, and little is known about how paradata-based measures translate in these situations.

In this context, our work using paradata in IWS (Goel et al. 2020), makes three important contributions. First, we illustrate the use of paradata in a developing country context, and highlight the trade-offs involved when using paradata in a resource-constrained environment typical of less-developed countries. Second, we provide a prototype for the use of paradata in monitoring and flagging ‘deviant’ enumerators while the survey is ongoing, rather than after the survey has been completed. This, we believe, is a cost-efficient way of collecting good quality survey data. Our third contribution is methodological, wherein we emphasise the use of ‘dynamic benchmarking’ within a group of enumerators who face similar external environments. Dynamic benchmarking within homogenous groups has received only a cursory mention in the literature (Guyer et al. 2021).

The case of ‘India Working Survey’

The IWS was implemented with the aim of understanding how social identities, specifically, caste, gender, and religion, influence livelihood outcomes. Data collection for the first wave of the IWS was outsourced to a private agency. The agency’s enumerators administered the survey through face-to-face interviews using CAI. Every household was visited by one female and one male enumerator, sometimes accompanied by their field supervisor. A total of 6,900 respondents from 3,623 households were contacted between 3 February and 17 March 2020.

Our method to improve enumerator behaviour is embedded within the statistical process control perspective, which advocates adopting procedures used in the quality control of industrial products. Below we describe paradata monitoring using what are called ‘flags’ to identify enumerators exhibiting deviant field practices. Once identified, the flagged enumerator’s supervisor would talk to them and provide constructive feedback. A flag, in our method, involves comparing a selected parameter within a group of enumerators who faced similar field conditions, and identifying those enumerators (if any), whose performance deviated substantially from the group average. We call each such group of enumerators a ‘comparison group’. The tenet to follow when defining a comparison group is that differences, in terms of individual characteristics and work environments, between enumerators in the same group should be as small as possible, while, at the same time, there should be considerable distinction between enumerators from different groups to warrant binning them separately. In other words, under stable field conditions, one should aim to minimise within-group variability and maximise between-group variability. In the IWS, a comparison group was defined as a specific state (Karnataka/Rajasthan), sub-region (urban/rural), and gender (of enumerator) combination, resulting in eight such groups. Restricting comparisons to only enumerators facing similar external conditions is crucial for the credibility of our method as it ensures that different data generating processes are not mixed. Once comparisons are restricted in this manner, it is possible to interpret the group average as the process average in steady state², and deviations from this average as deviant behaviour requiring intervention. It is important to note that a flag is only suggestive of a faulty practice and should not be construed as conclusive evidence of wrongdoing. This is because, while it correctly identifies deviant behaviour, it does not go into the reasons for it. It is possible, though unlikely³, that the said behaviour was the right response given the circumstances on the field. It is imperative that researchers emphasise this aspect to the field supervisors so that their conversations with flagged enumerators are not accusatory in nature.

Besides the centrality of comparison groups, another highlight of our method is what we call dynamic benchmarking. Instead of examining enumerator performance in a cumulative fashion, we studied it in separate blocks of a week at a time. By doing so, performance was flagged as deviant against a moving benchmark that accounted for all secular changes over time. For instance, as enumerators gain proficiency, there is a secular decline in the average interview duration and separate weekly windows would correctly account for this.

We also examine other aspects related to flag design such as setting threshold limits to define what is considered as deviant behaviour, and the particular field practices that were monitored using flags in the IWS. These include too little time spent on particular sections, or on the interview as a whole, inordinately low values in responses to certain questions (such as household size), and odd interview start timings. We also find that intervention based on these flags actually changed enumerator behaviour on the field⁴.

Exploring the potential of paradata

Paradata have tremendous potential to improve survey quality, which remains underutilised, especially in low- and middle-income countries. One way to encourage its use is for donor agencies that fund surveys to: (i) Mandate paradata use; (ii) Provide a budget specifically earmarked for it; and (iii) Require that some paradata be made public along with the main survey data. While much remains to be explored in what kinds of paradata can be used to monitor progress and how these can be operationalised, we attempt to set out a framework for researchers in developing countries to begin to better understand how to incorporate paradata in their own survey practices. Profit-oriented data collection agencies may then begin to view paradata not as a threat to their commercial interests, but as an integral tool to improve their business.

The authors are grateful to the National Council of Applied Economic Research (NCAER), New Delhi, for funding this research. They are also grateful to the Initiative for What Works to Advance Women and Girls in the Economy, Azim Premji University, and the Indian Institute of Management (Bangalore) for funding the India Working Survey on which this paper is based. Excellent research assistance from Naveen Gajjalagari and Mridhula Mohan is gratefully acknowledged.

I4I is on Telegram. Please click here (@Ideas4India) to subscribe to our channel for quick updates on our content

Notes:

These include pointed questions to the enumerator – Was the respondent cooperative? Did they seem hurried? Was it in done in private? – as well as more general observations for them to note down.
Steady state is a kind of equilibrium/stationary state in which either nothing is changing or things are changing at constant long-term trend rates.
The reason this is unlikely is because an enumerator is compared with others from the same comparison group. Given the way comparison groups have been defined we expect enumerators within a group to face similar field conditions, and therefore behave in a manner similar to the group average.
For more details, see Goel et al. (2020).

Bhuiyan, M Faress and Paula Lackie (2016), “Mitigating Survey Fraud and Human Error: Lessons Learned From A Low Budget Village Census in Bangladesh”, IASSIST Quarterly, 40(3): 20-26.
Choumert-Nkolo, Johanna, Henry Cust and Callum Taylor (2019), “Using paradata to collect better survey data: Evidence from a household survey in Tanzania”, Review of Development Economics, 23(2): 598-618.
Da Silva, D N and CJ Skinner (2020), “Testing for measurement error in survey data analysis using paradata”, Biometrika, 108(1): 239-246.
Edwards, B, A Maitland and S Connor (2017), ‘Measurement error in survey operations management’, in PP Biemer, E de Leeuw, S Eckman, B Edwards, F Kreuter, LE Lyberg, NC Tucker and BT West (eds.), Total Survey Error in Practice.
Edwards, B, H Sun and R Hubbard (2020), ‘Behavior change techniques for reducing interviewer contributions to total survey error’, in K Olson, JD Smyth, J Dykema, AL Holbrook, F Kreuter and BT West (eds.), Interviewer Effects from a Total Survey Error Perspective.
Finn, Arden and Vimal Ranchhod (2015), “Genuine Fakes: The Prevalence and Implications of Data Fabrication in a Large South African Survey”, The World Bank Economic Review, 31(1): 129-157.
Goel, D, R Abraham and R Lahoti (2021), ‘Improving Survey Quality Using Para Data: Lessons from the India Working Survey’, NCAER National Data Innovation Centre: Institutional Research Grant Report Number 03.
Guyer, HM, BT West and W Chang (2021), ‘The Interviewer Performance Profile (IPP): A Paradata-Driven Tool for Monitoring and Managing Interviewer Performance’, Survey Methods: Insights from the Field, 7 June.
Krueger, Brian S and Brady T West (2014), “Assessing the Potential of Paradata and Other Auxiliary Data for Nonresponse Adjustments”, Public Opinion Quarterly, 78(4): 795-831.
Krueter, F and K Olson (2013), ‘Paradata for nonresponse error investigation’, in F Krueter (ed.), Improving Surveys with Paradata: Analytic Uses of Process Information.
Olson, K, JD Smyth, J Dykema, AL Holbrook, F Kreuter, and BT West (2020), Interviewer Effects from a Total Survey Error Perspective, Chapman and Hall/CRC.

By: PRAMOD BHATT 17 April, 2022

The article is great. There is a huge potential of para data in improving data quality or knowing what happened duing the event. Para data provides event-wise logs for each interview. If the sample are in thousands, it could be tedious. Merging of interviews generally not possbile as the number of events varies. It could be a hurculian task to align all interviews in a straight line for each of analysis. This generally limits our capablities to fully utilize the benefits of para data. But it certainly provide the insgihts and help us to answer what happened during the interview.