Paradata Statistics Report

LSMS Paradata Tools: Basic Multilevel Model

Overview

Often survey practitioners would like to understand how much do variation in interviewers contribute to the variation of interview length. One could ask the same thing about the variation in enumeration area. If an “interviewer effect is large” for a specific module, this could indicate that additional training may be needed for the module. An intercept-only multi-level model is used here to provide a preliminary ranking of interviewer effects.

Intercept-Only Multilevel Model

We run a cross-cluster multilevel model with two-levels: at the enumeration area and interviewer. A household from an enumeration area is cross-assigned to an interviewer. This means that an area can have two different interviewers assigned to the area. The figure below provides an example of a cross assignment.

We first show an intercept-only multilevel model for a household module.

\[ \ln(y_{hjk}) = \beta + u_j + u_k + e_{hjk} \]

where \(y_{hjk}\) is the module duration of interest for household \(h\) living in area \(k\) and interviewer by interviewer \(j\). The model includes two types of random effects: \(u_j\) the interviewer random effect and \(u_k\) the enumeration area random effect. Lastly, \(e_{hjk}\) is the error term. We assume all random effects and the residual term are distributed normally and have a constant variance. We focus on log transformation of the interview length which worked better to fit the normality assumption on the residuals of the model. More specifically, \(u_j \sim N(0, \sigma_j^2) , u_k \sim N(0, \sigma_k^2), u_e \sim N(0, \sigma_e^2)\).

Multilevel models allow us to calculate the interclass correlation (ICC). The ICC describes how much of the variation in interview length is attributed to the grouping structure for the module: the same enumeration area group and the same interviewer group. One could also think of the ICC as the variance of the intercept, which in our case can be further decomposed into variation due to the enumeration area, the household, and the interviewer components. A large variance reflects a large variation in interview length while controlling for other factors. A large interviewer component for the ICC indicates that the distribution of interview slope across interviewers is quite spread out relative to other variance components. Thus, we use this measure to denote the interviewer effects.

\[ ICC_{interviewer} = \frac{\sigma_j^2}{\sigma_k^2 + \sigma_e^2 + \sigma_j^2} \]

For individual-level modules, we added a household random effect variable, \(u_h\), where \(u_h \sim N(0, \sigma_h^2)\). The interviewer ICC thus becomes:

\[ ICC_{interviewer} = \frac{\sigma_j^2}{\sigma_h^2 + \sigma_k^2 + \sigma_e^2 + \sigma_j^2} \]

For more information on the model construction, please see Hasanbasri et al. (2014) SJIAOS article for reference.

Household Modules Results

The table below shows the total variance (the denomimator of the ICC interviewer), the residual variance \(\sigma_e^2\), and ICC interviewer. One can rank modules based on these numbers.

Individual Modules Results

The table below shows the total variance (the denomimator of the ICC interviewer), the residual variance \(\sigma_e^2\), and ICC interviewer. One can rank modules based on these numbers. Individual modules are run through models with household random effects.