DHS data overview and preparation

Beginner

Overview

Household survey data from the Demographic and Health Surveys (DHS), Malaria Indicator Surveys (MIS), and Multiple Indicator Cluster Surveys (MICS) provide data on malaria epidemiology, intervention access and use, socioeconomic factors, and other malaria-relevant indicators. These surveys can be helpful in SNT to complement other data sources such as routine surveillance and routine intervention implementation data.

This page provides an overview of household survey data, indicator calculations, and instructions on how to access the DHS data files.

Objectives

Understand the strengths and limitations of household survey data for malaria analysis
Introduce the main DHS datasets used in SNT, including HR, PR, and KR files
Explain the role of survey weights and why they are necessary for producing representative estimates
Follow a step-by-step guide to access DHS data via three options

In the SNT workflow, household survey data can be used to estimate epidemiological indicators such as recent fevers, prevalence of malaria infection, and all-cause child mortality; health system indicators such as care-seeking behavior and antenatal care utilization; intervention indicators such as bednet, chemoprevention, and immunization coverage; socioeconomic indicators such as household wealth and housing quality; and numerous other indicators. Survey data can support different analysis steps such as situational assessment, stratification, and modeling.

Below are four pages that guide users through extracting and analyzing DHS data for these key malaria indicators. Other pages may be added at a later date, but for now users who would like to extract indicators not listed here may adapt the code on these pages, or the code below on this page, for their indicator(s) of interest.

Understanding and Preparing DHS Data for SNT Analysis

Background on Household Survey Data

Strengths of survey data include:

Consistent measurement for standard indicators across the entire country and over multiple years
Reliable population denominator
Focused on the populations typically most vulnerable to malaria: children under 5 and women of childbearing age
Multiple measurements and questions to the same household or individual permit many statistical analyses

Limitations of survey data include:

Limited spatial resolution: usually powered at the adm1 level, whereas SNT often requires a smaller operational unit for decision-making (adm2 or smaller)
Limited temporal resolution: occurs only every 3-5 years in most countries, and some countries may have only one or no surveys ever done
Temporal gaps due to cross-sectional design: data is collected over a few-month time period, which can differ between surveys such that some are performed during the high-transmission season and some in the low-transmission season, complicating interpretation of results
Demographic gaps: Certain indicators of interest in SNT are only measured in children under 5, such as prevalence of parasite infection and treatment-seeking behavior
Biases: data is subject to many of the biases related to observational, cross-sectional studies, including respondents’ recall bias or desirability bias. For instance, respondents are required to recall whether they took their child to a health facility in the past two weeks (recall bias) or may report bednet use because this is perceived to be the “correct” answer (desirability bias).

What is the difference between DHS and MIS?

Demographic and Health Surveys (DHS) collect data on a wide range of topics, including fertility, family planning, HIV/AIDS, nutrition, and malaria. However, DHS surveys do not specifically align with malaria transmission seasons. In contrast, Malaria Indicator Surveys (MIS) are designed to collect data on malaria-related indicators such as malaria prevalence, treatment-seeking rates, and coverage of malaria prevention measures such as vector control and chemoprevention. MIS are more likely to be, although are not always, collected during the high-transmission malaria season.

Accessing Survey Data

To extract indicators from DHS or MIS datasets, analysts must first obtain the relevant household survey files and understand where the variables of interest are stored. These files can be accessed in a few ways: through the DHS API using the rdhs R package, by downloading from the DHS website, or through direct collaboration with national data collectors (via the SNT team) who may hold unpublished or extended versions of survey datasets.

For analysts working in R, the rdhs package provides the most streamlined option. Pre-built indicators can be accessed directly via the API even without authentication. Those with authenticated access to DHS datasets may also use rdhs to download datasets via the API. This requires prior registration and project approval. However, those who did not register in time, or whose access to specific datasets (e.g., GPS shapefiles) was not approved before 2025, will not be able to download these files via either the API or the website. Since then, new user registration has been suspended due to DHS platform instability linked to USAID funding cuts. Users with pre-existing credentials can still access the system at the time of writing, though future access is not guaranteed.

Warning

Note: The rdhs API is primarily a data access tool:

rdhs to access pre-built indicators (described more below): directly grab the numbers you need from a limited list of common indicators of interest. In this case, you do not download any datasets.
rdhs to download full datasets: rdhs simplifies downloading DHS datasets but does not perform any data preparation or indicator construction. Analysts who use the API to download datasets must still handle all data cleaning, variable harmonization, dataset linking (e.g., household and individual files), and indicator calculations themselves. This means that even with API access, the majority of the analytical work occurs after the download and requires careful handling of local files.

If you have access to already-downloaded datasets, Option 2b below shows how to read them into your workspace.

Where direct access to datasets is not possible, some information can still be retrieved from DHS or MIS public reports, or the STATCompiler platform. These typically include national- or regional-level summary tables, which can be digitized by the analyst for use in SNT, or when downloaded from StatCompler, do not need digitization. Such tables would be insufficient for detailed analysis but may still be very informative if otherwise data on key indicators is completely lacking. The official reported regional estimates are also a useful resource to double-check indicator calculation of developed scripts at the region level.

The most reliable option for data access is to coordinate with the SNT team or national statistics office. Local institutions who led survey implementation and data analysis could hold internal survey datasets that were not made public on the DHS platform. In some cases, these internal versions include additional disaggregation, updated weights, or supplementary modules tailored to local needs. However, sometimes these datasets were not shared due to serious concerns about the quality of survey implementation, so it is best to discuss with the SNT team on whether such datasets may still be suitable for use.

Required Files and Variables

Many analyses using DHS or MIS data for SNT rely on the following datasets: the Household Recode (HR), the Person Recode (PR), the Children’s Recode (KR), the Individual Recode (IR), the Birth Recode (BR), in addition to the geographical clusters (GE) if accessible. These files are standardized across countries and survey rounds, and most of the core variables used for malaria and more general health or socio-demographic indicators retain the same naming conventions (e.g., hv005, hv103, hml12, h22, b5). However, small differences in survey implementation or recoding can occur, especially in how weights are constructed or how indicator variables are derived, so it is always good practice to cross-reference with the survey’s final report or consult with the SNT team when needed.

The HR file contains household-level data and is typically the starting point for understanding household composition, net ownership, and housing characteristics. It includes variables such as hv013 (number of de facto household members), hml1 (number of mosquito nets), and net-level variables like hml10_*, which describe the type and treatment status of each net. This file also includes key design variables such as hv005 (household sample weight), hv021 (cluster ID), and hv022 (stratum ID). These are essential for all subsequent analysis and are used for weighting and survey design corrections.
The PR file contains individual-level data for every person who was present in the household the night before the survey (i.e., de facto members). It is used to assess outcomes such as ITN usage (hml12) and malaria infection status from testing (hml32 or hml35, depending on whether microscopy or rapid diagnostic testing was used). It also includes a flag for de facto residence (hv103) to help identify which individuals are eligible for inclusion in indicator denominators. The same household and cluster identifiers (hv005, hv021, hv022) appear here to allow linkage with household characteristics and for use in survey-weighted analysis.
The KR file focuses on children born in the five years preceding the survey, linking each child to their mother and including information on recent illness episodes, care-seeking behavior, and birth history. This is the primary dataset used to analyze treatment-seeking for fever (e.g., h22 for whether the child had fever, h32 for whether treatment was sought) and for estimating child mortality indicators (e.g., b5 for survival status, b6 and b7 for age at death and date of birth). These variables are critical for estimating neonatal, infant, and under-five mortality using direct estimation approaches. The KR file also contains the same design variables (v005, v021, v022), with v005 used as the appropriate sample weight for children.
The IR file contains individual-level data for women in the household, including all the information in the women’s questionnaire. Birth history and antenatal care including receipt of intermittent preventive therapy in pregnancy (IPTp) data are collected in the IR file.
The BR file contains records on both mother and children for the full birth history of all women interviewed, including information on pregnancy, postnatal care, and immunization and health for children born in the last 5 years. All-cause under-5 mortality calculations can use birth history data reported in this file as input.
The GE file contains coordinates for the household clusters, required when analysis needs to be performed at the sub-adm1 level.

Quick tip: Estimations at finer resolution

While the DHS pages in this code library focus on using survey data at the adm1 level (consistent with how most DHS surveys are powered) advanced users with access to cluster coordinates (GE file) may apply geostatistical modeling or small-area estimation techniques to produce finer-resolution outputs when needed.

Together, these datasets provide the core foundation for most SNT-relevant indicators. While some indicators can be extracted from a single file, other analyses require joining files together using household or cluster IDs. Proper application of sample weights and understanding the survey design are essential throughout.

All variables used here are described in detail in the DHS recode manuals (View latest DHS-7). These should be referenced regularly, especially if discrepancies are observed or if working in a context with modified questionnaires. In addition, indicator calculations are defined in the Guide to DHS Statistics. These are helpful to ensure the correct denominators and filters are used esspecialy when multiple related definitions exist for an indicator (i.e. vaccination coverage, bed net use).

Working with Survey Weights

When surveys are conducted, not everyone has the same chance of being selected. Some areas may be oversampled or undersampled. Survey weights adjust for this, so results better reflect the real population.

You can think of a weight as saying: “Each person in the survey stands in for many others.”

Example:

100 people from District A, each representing 5 people
50 people from District B, each representing 20 people

Reported ITN use:

40 people from District A
25 people from District B

Raw estimate (unweighted):

65 / 150 = 43% — this treats everyone equally

Weighted estimate:

District A: 40 × 5 = 200 people
District B: 25 × 20 = 500 people
Total weighted usage = 700 / 1,500 = 46.7%

So the weighted result is 46.7%, not 43%, a closer representation of reality. This example is for illustration purposes only, as real DHS weights are more complex.

Using survey weights is key when calculating indicators or re-analyzing survey data. Without them, results can be biased. We’ll use them throughout this section to ensure estimates more accurately reflect true ITN ownership and usage in the population. For more detail on the theory of survey weighting and sampling consult the DHS tutorials.

Step-by-Step

In this guide, we present three practical paths for working with DHS data in SNT workflows. The options differ in ease of use, control over indicator construction, and dependence on external systems.

Key differences across options

Option 1 (DHS indicators): Fastest option — pull pre-built indicators via the DHS site. You are limited to what’s already calculated and predefined by DHS, there is no access to full microdata, and there is limited flexibility to change aggregation levels or compute custom indicators.
Option 2a (raw data via API): Use the rdhs package to download raw datasets and build indicators from scratch, enabling maximal use of survey data. Requires API authentication and approved access to specific survey files, which may be difficult to obtain for new users.
Option 2b (raw datasets already in hand): If you already have downloaded raw files (.dta, .sav, etc.), you can do all the analyses possible with Option 2a, while skipping the download step.

Your goal	Pre-built indicators?	Access without authentication?	Full Data?	Custom Aggregation?
Pull standard indicators from DHS API (Option 1)	✅ Yes	✅ Yes	❌ No	❌ No
Build from raw data from DHS website (Option 2a)	❌ No	❌ No	✅ Yes	✅ Yes
Build from local copy of raw data (Option 2b)	❌ No	✅ Yes	✅ Yes	✅ Yes

To skip the step-by-step explanation, jump to the full code at the end of this page.

Option 1: Access DHS Data Indicators Directly

In some cases, analysts do not need to download and process full DHS datasets manually. Instead, it is possible to query standardized indicators directly from the DHS API without any credentials. This includes many core malaria indicators—such as ITN ownership, access, and usage—disaggregated by administrative region or demographic background variables.

This approach is particularly useful when:

you don’t have access to DHS website credentials;
you only need standard indicators;
you don’t need access to DHS cluster coordinates or full survey data;
you want to rapidly prototype or cross-check indicators.

However, it’s important to note that these pre-calculated indicators are not available for every survey, and custom indicators or subnational disaggregation below adm1 (such as stratified aggregation across multiple dimensions like age, sex, urbanicity, pregnancy status, and education) typically require access to the raw DHS survey data and manual calculation (see Option 2).

Caution: Limited reliability of Option 1

Option 1 offers quick access to pre-aggregated indicators, but its availability isn’t guaranteed due to ongoing DHS funding disruptions.

If the DHS website is unavailable, or if you need access to more granular data for custom analysis, see Option 2b for working with full datasets. This requires calculating indicators from raw data, with step-by-step guidance provided on each DHS page of this code library.

Step 1.0: Inspect STATcompiler (DHS API Dashboard)

The DHS API provides access to standard indicators as presented in the DHS STATcompiler.
Before diving into the specific steps, it’s useful to familiarize yourself with this tool.

STATcompiler allows you to:
- Quickly look up DHS indicator values
- Download pre-built indicators by country
- View ready-made charts

Step 1.1: Install and load required packages

We begin by loading the required packages and setting up custom functions. If you have not installed the required packages, installations will run automatically and take less than three minutes to finish.

The check_dhs_indicators() function mirrors rdhs::dhs_indicators() and is used to view available DHS indicators.

The download_dhs_indicators() function mirrors rdhs::dhs_data() to download specific indicator data.

The key difference is that both custom functions access data directly from https://api.dhsprogram.com rather than using the rdhs client, which requires credentials.

Tip: Functions available in sntutils package

If you prefer minimal setup, these functions are also available in the sntutils package. You can use those versions by installing and loading sntutils and calling sntutils::download_dhs_indicators() and sntutils::check_dhs_indicators(). In that case, there is no need to copy and run the function definitions in the code block below.

Show full set-up code

# install or load required packages
pacman::p_load(
  purrr,    # Functional iteration (e.g., map)
  httr,     # HTTP requests (e.g., GET)
  jsonlite, # Parse JSON from APIs
  cli,      # Styled console messages
  dplyr     # For data management
)

#' Check DHS Indicator List from API
#'
#' @param countryIds DHS country code(s), e.g., "EG"
#' @param indicatorIds Specific indicator ID(s)
#' @param surveyIds Survey ID(s)
#' @param surveyYear Exact year
#' @param surveyYearStart Start of year range
#' @param surveyYearEnd End of year range
#' @param surveyType DHS survey type (e.g., "DHS", "MIS")
#' @param surveyCharacteristicIds Filter by survey characteristic ID
#' @param tagIds Filter by tag ID
#' @param returnFields Fields to return (default: IndicatorId, Label, Definition)
#' @param perPage Max results per page (default = 500)
#' @param page Specific page to return (default = 1)
#' @param f Format (default = "json")
#'
#' @return A data.frame of indicators
#' @export
check_dhs_indicators <- function(
  countryIds = NULL,
  indicatorIds = NULL,
  surveyIds = NULL,
  surveyYear = NULL,
  surveyYearStart = NULL,
  surveyYearEnd = NULL,
  surveyType = NULL,
  surveyCharacteristicIds = NULL,
  tagIds = NULL,
  returnFields = c("IndicatorId", "Label", "Definition", "MeasurementType"),
  perPage = NULL,
  page = NULL,
  f = "json"
) {
  # Base URL
  base_url <- "https://api.dhsprogram.com/rest/dhs/indicators?"

  # Build query string
  params <- list(
    countryIds = countryIds,
    indicatorIds = indicatorIds,
    surveyIds = surveyIds,
    surveyYear = surveyYear,
    surveyYearStart = surveyYearStart,
    surveyYearEnd = surveyYearEnd,
    surveyType = surveyType,
    surveyCharacteristicIds = surveyCharacteristicIds,
    tagIds = tagIds,
    returnFields = paste(returnFields, collapse = ","),
    perPage = perPage,
    page = page,
    f = f
  )

  # Drop NULLs and encode
  query <- paste(
    purrr::compact(params) |>
      purrr::imap_chr(
        ~ paste0(.y, "=", URLencode(as.character(.x), reserved = TRUE))
      ),
    collapse = "&"
  )

  # Full URL
  full_url <- paste0(base_url, query)

  # Fetch with progress bar
  response <- httr::GET(full_url, httr::progress())
  jsonlite::fromJSON(httr::content(
    response,
    as = "text",
    encoding = "UTF-8"
  ))$Data
}

#' Query DHS API Directly via URL Parameters
#'
#' Builds and queries DHS API for indicator data using URL-based access
#' instead of rdhs package.
#'
#' @param countryIds Comma-separated DHS country code(s), e.g., "SL"
#' @param indicatorIds Comma-separated DHS indicator ID(s), e.g., "CM_ECMR_C_U5M"
#' @param surveyIds Optional comma-separated survey ID(s), e.g., "SL2016DHS"
#' @param surveyYear Optional exact survey year, e.g., "2016"
#' @param surveyYearStart Optional survey year range start
#' @param surveyYearEnd Optional survey year range end
#' @param breakdown One of: "national", "subnational", "background", "all"
#' @param f Format to return (default is "json")
#'
#' @return A data.frame containing the `Data` portion of the API response.
#' @export
download_dhs_indicators <- function(
  countryIds,
  indicatorIds,
  surveyIds = NULL,
  surveyYear = NULL,
  surveyYearStart = NULL,
  surveyYearEnd = NULL,
  breakdown = "subnational",
  f = "json"
) {
  # Base URL
  base_url <- "https://api.dhsprogram.com/rest/dhs/data?"

  # Assemble query string
  query <- paste0(
    "breakdown=",
    breakdown,
    "&indicatorIds=",
    indicatorIds,
    "&countryIds=",
    countryIds,
    if (!is.null(surveyIds)) paste0("&surveyIds=", surveyIds),
    if (!is.null(surveyYear)) paste0("&surveyYear=", surveyYear),
    if (!is.null(surveyYearStart)) paste0("&surveyYearStart=", surveyYearStart),
    if (!is.null(surveyYearEnd)) paste0("&surveyYearEnd=", surveyYearEnd),
    "&lang=en&f=",
    f
  )

  full_url <- paste0(base_url, query)

  cli::cli_alert_info("Downloading DHS data...")

  response <- httr::GET(full_url, httr::progress())

  if (httr::http_error(response)) {
    stop("API request failed: ", httr::status_code(response))
  }

  content_raw <- httr::content(response, as = "text", encoding = "UTF-8")
  data <- jsonlite::fromJSON(content_raw)$Data

  cli::cli_alert_success("Download complete: {nrow(data)} records retrieved.")
  return(data)
}

To adapt the code:

No adaptation needed.

Step 1.2: Search for available indicators

To start, we use check_dhs_indicators() to explore which indicators are available and find the IndicatorId for the variable of interest. This function returns a searchable table of indicator codes, descriptions, and associated countries and years.

# get available indicators
indicators <- check_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2019,
  surveyType = "DHS",
  returnFields = c("IndicatorId", "Label", "MeasurementType"),
)

# filter to find ITN-related indicators
indicators |>
  dplyr::filter(stringr::str_detect(Label, "ITN"))

# A tibble: 19 × 3
   MeasurementType IndicatorId   Label                                          
   <chr>           <chr>         <chr>                                          
 1 Percent         ML_NETP_H_ITN Households with at least one insecticide-treat…
 2 Mean            ML_NETP_H_MNI Mean number of insecticide-treated mosquito ne…
 3 Percent         ML_NETP_H_IT2 Households with at least one insecticide-treat…
 4 Percent         ML_ITNA_P_ACC Persons with access to an insecticide-treated …
 5 Percent         ML_NETU_P_ITN Population who slept under an insecticide-trea…
 6 Percent         ML_NETU_P_IT1 Population who slept under an insecticide-trea…
 7 Number          ML_NETU_P_NM1 Number of persons living in a household with a…
 8 Number          ML_NETU_P_UN1 Number of persons living in a household with a…
 9 Percent         ML_ITNU_N_ITN Existing insecticide-treated mosquito nets (IT…
10 Number          ML_ITNU_N_NUM Number of insecticide-treated mosquito nets (I…
11 Number          ML_ITNU_N_UNW Number of insecticide-treated mosquito nets (I…
12 Percent         ML_NETC_C_ITN Children under 5 who slept under an insecticid…
13 Percent         ML_NETC_C_IT1 Children under 5 who slept under an insecticid…
14 Number          ML_NETC_C_NM1 Number of children under 5 in households with …
15 Number          ML_NETC_C_UN1 Number of children under 5 in households with …
16 Percent         ML_NETW_W_ITN Pregnant women who slept under an insecticide-…
17 Percent         ML_NETW_W_IT1 Pregnant women who slept under insecticide-tre…
18 Number          ML_NETW_W_NM1 Number of pregnant women in households with at…
19 Number          ML_NETW_W_UN1 Number of pregnant women in households with at…

To adapt the code:

Line 3: Replace “SL” with your country’s country code. Find your DHS_CountryCode here
Line 4: Use the year of the survey you are querying
Line 5: Update survey type if needed, for example if you are querying an MIS
Line 8: Replace “ITN” with a relevant keyword (e.g., “mortality”, “fever”, “treatment”) to search for indicators aligned with your analysis focus. Use terms found in the IndicatorId field, not the full label.

From this search, we identify that ML_NETP_H_ITN corresponds to: Households with at least one insecticide-treated mosquito net (ITN)

Step 1.3: Download indicators at subnational level

We can now retrieve this indicator for Sierra Leone (country code “SL”) in 2019, broken down by subnational level (e.g., district).

To quickly visualize these indicators spatially, you can download the country-specific shapefile for the 2019 DHS directly from the DHS Spatial Data Repository. This shapefile corresponds to the survey year and administrative level of the extracted indicators and makes it easy to map the results. However, remember that for final maps you will need to use the official shapefile approved by the SNT team by merging your DHS data extraction to your SNT shapefile.

# get the DHS shapefile
sle_dhs_shp <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries",
    "1ai_adm2",
    "sdr_subnational_boundaries_adm2.shp"
  )
) |>
  dplyr::select(
    adm1 = OTHREGNA,
    adm2 = DHSREGEN,
    adm2_code = REG_ID
  )

# get pre-aggregated ITN access data at subnational
itn_access_data <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2019,
  indicatorIds = "ML_NETP_H_ITN",
  breakdown = "subnational"
)

# join indicator with shapefile
indicator_adm2 <- itn_access_data |>
  dplyr::inner_join(
    sle_dhs_shp,
    by = c("RegionId" = "adm2_code")
  ) |>
  sf::st_as_sf() |>
  dplyr::select(
    survey_id = SurveyId,
    adm1,
    adm2,
    adm2_code = RegionId,
    prop = Value,
    indicator = Indicator
  )

# check indicator
sf::st_drop_geometry(indicator_adm2)

   survey_id          adm1               adm2       adm2_code prop
1  SL2019DHS       Eastern           Kailahun SLDHS2019510011 81.6
2  SL2019DHS       Eastern             Kenema SLDHS2019510012 79.7
3  SL2019DHS       Eastern               Kono SLDHS2019510013 75.3
4  SL2019DHS      Northern          Koinadugu SLDHS2019510027 74.4
5  SL2019DHS      Northern             Falaba SLDHS2019510026 74.9
6  SL2019DHS      Northern            Bombali SLDHS2019510029 72.5
7  SL2019DHS      Northern          Tonkolili SLDHS2019510025 73.5
8  SL2019DHS North Western             Kambia SLDHS2019510022 72.5
9  SL2019DHS North Western             Karene SLDHS2019510028 71.4
10 SL2019DHS North Western          Port Loko SLDHS2019510030 57.9
11 SL2019DHS      Southern                 Bo SLDHS2019510031 68.3
12 SL2019DHS      Southern             Bonthe SLDHS2019510032 94.8
13 SL2019DHS      Southern            Moyamba SLDHS2019510033 69.3
14 SL2019DHS      Southern            Pujehun SLDHS2019510034 79.8
15 SL2019DHS       Western Western Area Rural SLDHS2019510041 48.8
16 SL2019DHS       Western Western Area Urban SLDHS2019510042 49.6
                                                             indicator
1  Households with at least one insecticide-treated mosquito net (ITN)
2  Households with at least one insecticide-treated mosquito net (ITN)
3  Households with at least one insecticide-treated mosquito net (ITN)
4  Households with at least one insecticide-treated mosquito net (ITN)
5  Households with at least one insecticide-treated mosquito net (ITN)
6  Households with at least one insecticide-treated mosquito net (ITN)
7  Households with at least one insecticide-treated mosquito net (ITN)
8  Households with at least one insecticide-treated mosquito net (ITN)
9  Households with at least one insecticide-treated mosquito net (ITN)
10 Households with at least one insecticide-treated mosquito net (ITN)
11 Households with at least one insecticide-treated mosquito net (ITN)
12 Households with at least one insecticide-treated mosquito net (ITN)
13 Households with at least one insecticide-treated mosquito net (ITN)
14 Households with at least one insecticide-treated mosquito net (ITN)
15 Households with at least one insecticide-treated mosquito net (ITN)
16 Households with at least one insecticide-treated mosquito net (ITN)

To adapt the code:

Line 3: Replace “SL” with your country’s DHS code (e.g., “NG” for Nigeria, “BF” for Burkina Faso).
Line 4: Optional. Set to a specific year if you want to restrict to one round (e.g., 2019).
Line 5: Use the specific IndicatorId relevant to your analysis (e.g., “ML_CORT_P_ALL” for care-seeking or “ML_IPTP_D_PCT” for IPTp).
Line 6: Keep “subnational” to get region-level data; use “national” for country-wide aggregates.

Step 1.4: (If needed) Using custom shapefiles when DHS/MIS boundaries are unavailable or not preferred

Once you have downloaded your indicators of interest, the next step is to match them to administrative units on a shapefile. If a shapefile directly associated with your survey of interest is available from the DHS Spatial Data Repository, this process is simple and is illustrated in this code library here.

In other cases, a shapefile corresponding to the specific DHS or MIS country-year may not exist in the DHS Spatial Data Repository. Alternatively, users may prefer to use their own shapefile, for instance, one aligned with national program boundaries or used elsewhere in the SNT workflow.

In either situation, users can still proceed by manually matching on administrative unit names from the indicator dataset. Below is an example of how administrative-level information is represented in the CharacteristicLabel column of the indicator output. Names with leading dots (e.g., ..Kailahun) typically refer to admin2 units, while names without dots (e.g., Eastern) indicate admin1 units.

# check the admin label column
itn_access_data |>
  dplyr::distinct(CharacteristicLabel)

         CharacteristicLabel
1                    Eastern
2                 ..Kailahun
3                   ..Kenema
4                     ..Kono
5     Northern (before 2017)
6                   Northern
7  ..Koinadugu (before 2017)
8              ....Koinadugu
9                 ....Falaba
10                 ..Bombali
11               ..Tonkolili
12             North Western
13                  ..Kambia
14                  ..Karene
15               ..Port Loko
16                  Southern
17                      ..Bo
18                  ..Bonthe
19                 ..Moyamba
20                 ..Pujehun
21                   Western
22           ..Western Rural
23           ..Western Urban

It’s important to note that in some instances, indicator results may reflect older administrative boundaries. For example, a label like Northern (before 2017) refers to the boundary configuration prior to Sierra Leone’s 2017 re-zoning. These legacy labels may appear alongside newer ones such as Northern or North Western in the indicator results, depending on how the data was processed.

If using a recent shapefile (e.g., post-2017), match to updated labels like Northern. If using an older shapefile and the survey year is before the cutoff (e.g., 2017), use legacy labels like Northern (before 2017). Always align indicator labels with the boundary structure valid at the time of data collection.

Show full set-up code

# get shapefile
sle_diff_shp <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries",
    "1ai_adm2",
    "2021.shp"
  )
) |>
  dplyr::select(
    adm1 = FIRST_REGI,
    adm2 = FIRST_DNAM
  )

sle_dhs_shp <- sf::read_sf(
  here::here("english/data_r/DHS/sdr_subnational_boundaries_adm2.geojson")
) |>
  dplyr::select(
    adm1 = OTHREGNA,
    adm2 = DHSREGEN,
    adm2_code = REG_ID
  )

# clean admin labels in indicator data
itn_access_data2 <- itn_access_data |>
  dplyr::mutate(
    label_updated = dplyr::case_when(
      CharacteristicLabel == "..Kailahun" ~ "KAILAHUN",
      CharacteristicLabel == "..Kenema" ~ "KENEMA",
      CharacteristicLabel == "..Kono" ~ "KONO",
      CharacteristicLabel == "....Koinadugu" ~ "KOINADUGU",
      CharacteristicLabel == "..Tonkolili" ~ "TONKOLILI",
      CharacteristicLabel == "..Kambia" ~ "KAMBIA",
      CharacteristicLabel == "..Karene" ~ "KARENE",
      CharacteristicLabel == "..Bombali" ~ "BOMBALI",
      CharacteristicLabel == "....Falaba" ~ "FALABA",
      CharacteristicLabel == "..Port Loko" ~ "PORT LOKO",
      CharacteristicLabel == "..Bo" ~ "BO",
      CharacteristicLabel == "..Bonthe" ~ "BONTHE",
      CharacteristicLabel == "..Moyamba" ~ "MOYAMBA",
      CharacteristicLabel == "..Pujehun" ~ "PUJEHUN",
      CharacteristicLabel == "..Western Rural" ~ "WESTERN RURAL",
      CharacteristicLabel == "..Western Urban" ~ "WESTERN URBAN",
      CharacteristicLabel == "Eastern" ~ "EASTERN",
      CharacteristicLabel == "Northern" ~ "NORTHERN",
      CharacteristicLabel == "North Western" ~ "NORTHERN",
      CharacteristicLabel == "Western" ~ "WESTERN",
      CharacteristicLabel == "Southern" ~ "SOUTHERN",
      TRUE ~ NA
    )
  )

# join cleaned indicator data to shapefile using admin2
itn_access_data2_joined <- itn_access_data2 |>
  dplyr::inner_join(sle_diff_shp, by = c("label_updated" = "adm2"))

# clean up columns
final_indicator_df <-
  itn_access_data2_joined |>
  sf::st_as_sf() |>
  dplyr::select(
    survey_id = SurveyId,
    adm1,
    adm2 = label_updated,
    adm2_code = RegionId,
    prop = Value,
    indicator = Indicator
  )

# calculate number of unique shapefile districts matched in final data
n_shp_total <- dplyr::n_distinct(sle_diff_shp$adm2)
n_shp_matched <- dplyr::n_distinct(final_indicator_df$adm2)

cli::cli_alert_success(
  "{n_shp_matched} out of {n_shp_total} shapefile districts matched in the final joined dataset."
)

cat("\n")

# view distinct mappings used in join
itn_access_data2_joined |>
  dplyr::distinct(CharacteristicLabel, label_updated)

Output

   CharacteristicLabel label_updated
1           ..Kailahun      KAILAHUN
2             ..Kenema        KENEMA
3               ..Kono          KONO
4        ....Koinadugu     KOINADUGU
5           ....Falaba        FALABA
6            ..Bombali       BOMBALI
7          ..Tonkolili     TONKOLILI
8             ..Kambia        KAMBIA
9             ..Karene        KARENE
10         ..Port Loko     PORT LOKO
11                ..Bo            BO
12            ..Bonthe        BONTHE
13           ..Moyamba       MOYAMBA
14           ..Pujehun       PUJEHUN
15     ..Western Rural WESTERN RURAL
16     ..Western Urban WESTERN URBAN

To adapt the code:

Shapefile (Lines 3–5): Replace the admin-2 shapefile with an admin1 version if needed. Update join fields (e.g., use DHSREGNA for admin1)..
Name cleaning (Lines 7–29): The CharacteristicLabel values are manually harmonized using case_when() to match shapefile naming. Rows representing legacy units (e.g., Northern (before 2017)) or admin1 summaries (e.g., Eastern) are intentionally excluded from the mapping. This prevents mismatches when joining to a newer shapefile.
Lines 31: The inner_join() ensures only indicator rows with valid admin2 matches in the shapefile are retained. This automatically filters out admin1 rows and legacy regions not included in the label_updated field.

The indicator results are in long format, containing both admin1 and admin2 names. In this step, we cleaned both levels but joined only on admin2 names using our shapefile and an inner join. This ensures that only observations with valid matches in the shapefile are retained, and admin1 names are automatically inherited from the shapefile’s structure.

From the results, you can see that both pre-2017 admin units and admin1-level rows are no longer present in the joined indicator data. This is because we intentionally excluded them by only mapping relevant admin2 names in the case_when() statement—effectively filtering out older or unmatched units that don’t align with our shapefile.

We now can save the final indicator output.

# define save directory
save_path <- here::here("1.6_health_systems/1.6a_dhs")

#| Save final joined ITN indicators with spatial data
rio::export(
  indicator_adm2 |> sf::st_drop_geometry(),
  here::here(save_path, "processed", "itn_adm2_indicator.csv")
)

rio::export(
  indicator_adm2,
  here::here(save_path, "processed", "itn_adm2_indicator.rds")
)

# Save final joined ITN indicators with spatial data (manual cleaning of admin names)
rio::export(
  final_indicator_df |> sf::st_drop_geometry(),
  here::here(save_path, "processed", "itn_adm2_indicator_manual.csv")
)

rio::export(
  final_indicator_df,
  here::here(save_path, "processed", "itn_adm2_indicator_manual.rds")
)

Option 2a: Download DHS Data Via `rdhs`

In instances where we want more control over our data analysis and aggregation, it’s important to work directly with raw DHS or MIS datasets.

To begin working with DHS or MIS data in SNT workflows, we first need to access the appropriate datasets. This is typically done using the rdhs package, which provides an interface to the DHS Program API (also accessed under Option 1), and can download data from the DHS website. This allows users to programmatically search for, download, and manage DHS datasets directly from R. For this, we will retrieve three key datasets: HR (Household Recode), PR (Person Recode) and KR (Children’s Recode).

These datasets are highly standardized, and the rdhs package makes it easy to filter by country, year, and survey type to retrieve the correct files for analysis.

Caution: Access to Option 2a May Be Restricted

Option 2a (using the rdhs package to download raw DHS datasets) requires DHS account credentials and prior project approval. As of now, new registrations are suspended due to funding-related disruptions, and access depends on the long-term stability of the DHS website—which cannot be guaranteed given ongoing budget cuts.

If you’re unable to access data via rdhs, see Option 2b for alternative routes such as requesting the SNT team to facilitate datasets access via institutional archives. The SNT code library supports full workflows using any of these sources.

Step 2a.1: Set up

Before we begin working with DHS datasets using Option 2a, we need to ensure the appropriate packages are installed and loaded.

# install or load required packages
pacman::p_load(
  rdhs      # DHS API access and dataset management
)

Step 2a.2: Download DHS data

Before any downloading, we set up the rdhs configuration to authenticate and manage downloads from the DHS Program via the API. This is a one-time setup per project or machine, and the function will securely store your login credentials in a hidden configuration file on your computer (either locally within the project or globally, depending on the config_path you specify).

The configuration file includes your registered email, project name, and cache preferences, and allows future API calls to proceed without re-authentication on that machine. If you switch devices or collaborate with others, they will need to create their own configuration file using their DHS account credentials.

# Set configuration
rdhs::set_rdhs_config(
  email = "my_email_address@gmail.com",
  project = "SNT for SL",
  config_path = "rdhs.json",
  cache_path = here::here("1.6_health_systems/1.6a_dhs"),
  data_frame = "data.table::as.data.table",
  global = FALSE,
  password_prompt = TRUE,
  verbose_setup = TRUE,
  timeout = 120,
  verbose_download = TRUE
)

To adapt the code:

Line 3 (email): Change the email field to your own DHS account email address.
Line 4 (project): Replace SNT for SL with the name of your DHS project as registered in your DHS account settings (this is important, ensure it doesn’t deviate from what you have on your DHS profile).
Line 5 (config_path): Set config_path = "rdhs.json" to specify where the RDHS configuration file should be saved. Use a project-specific path (recommended), or ~/.rdhs.json for a global config.
Line 6 (cache_path): Set cache_path = here::here("...") to define where DHS datasets and API calls will be cached. If left blank, rdhs will default to your system’s user cache directory (if permitted).
Line 7 (data_frame): Leave as-is unless you prefer another return type (e.g., tibble::as_tibble).
Lines 8–11: These control session behavior (e.g., verbosity, timeouts). No changes are needed unless you want to customize setup preferences.

Once updated, run the code to authenticate with DHS.

Once the code is executed, you will be prompted to enter your DHS account password. This password is securely stored as a locally encrypted credential (a keyring secret) on your machine. You will not need to re-enter it for future API calls, as long as you remain on the same device and use the same configuration file.

Step 2a.3: Download DHS data

We now query the DHS API for two surveys from Sierra Leone: the 2016 DHS and the 2019 MIS, and download the relevant datasets in .rds format. These include three key files (PR, HR and KR) for each year.

Together, these datasets will be used in the four code library pages covering ITN ownership and usage, parasite prevalence (PfPR), treatment-seeking behavior, and under-five mortality. For Sierra Leone, the expected filenames include:

SLPR7HFL, SLHR7HFL, SLKR7HFL for 2016
SLPR7AFL, SLHR7AFL, SLKR7AFL for 2019

These are downloaded using rdhs::get_datasets() and stored in your project directory in .rds format for use in later analysis.

# Pull filenames for 2016 DHS and 2019 DHS MIS in Sierra Leone
data_filename <- rdhs::dhs_datasets(
  countryIds = "SL",
  surveyYear = c("2016", "2019"),
  fileType = c("PR", "HR", "KR"),
  surveyType = c("DHS", "MIS"),
  fileFormat = "FLAT"
) |>
  dplyr::group_by(DatasetType, CountryName, FileType) |>
  dplyr::distinct(FileName) |>
  dplyr::pull(FileName)

# Download and save the datasets as .rds
rdhs::get_datasets(
  dataset_filenames = data_filename,
  download_option = "rds",
  output_dir_root = here::here("1.6_health_systems/1.6a_dhs/raw"),
  clear_cache = TRUE
)

To adapt the code:

Line 3 (countryIds): Replace “SL” with the ISO 2-letter country code for your country of interest (e.g., “NG” for Nigeria, “ML” for Mali).
Line 4 (surveyYear): Replace 2016 and 2019 with the year(s) relevant to your analysis.
Line 12 (output_dir_root): Change the folder path in here::here("1.6_health_systems/1.6a_dhs/raw") to match your own local project structure for saving downloaded .rds files.

Once updated, run the code to retrieve and store the relevant DHS datasets locally. All downstream code will operate on these .rds files.

Once the code is executed, the .rds files should appear in your local repository under the specified output folder. If you’re following the exact example above saving your data locally (here::here("1.6_health_systems/1.6a_dhs/raw")), your folder will look something like this:

1.6_health_systems/
└── 1.6a_dhs/
    └── raw/
        ├── db/             # rdhs local cache folder
        ├── SLPR7HFL.rds    # 2016 Sierra Leone DHS - Person Recode
        ├── SLHR7HFL.rds    # 2016 Sierra Leone DHS - Household Recode
        ├── SLKR7HFL.rds    # 2016 Sierra Leone DHS - Kids Recode
        ├── SLPR7AFL.rds    # 2019 Sierra Leone MIS - Person Recode
        ├── SLHR7AFL.rds    # 2019 Sierra Leone MIS - Household Recode
        └── SLKR7AFL.rds    # 2019 Sierra Leone MIS - Kids Data

Option 2b: Access Data Via SNT Team or Institutional Partners

If your organization or colleagues have already downloaded the datasets, you can request them directly from the SNT team, NMP partners, or shared institutional repositories.

This route is especially useful if you don’t have DHS API credentials or are affected by access restrictions (see Option 1 and 2a). It can be a more reliable long-term approach that isn’t dependent on external systems that may become unavailable.

However, this approach comes with trade-offs such as less control over file formats which may vary depending on how the data was originally shared or downloaded.

Step 2b.1: Set up

Before we begin working with DHS datasets using Option 2b, we need to ensure the appropriate packages are installed and loaded.

# install or load required packages
pacman::p_load(
  haven,    # For importing different types files
  sf,       # For importing shapefiles
  mdbr      # For importing microsoft database files
)

Step 2b.2: Importing DHS data

In this step, we focus on the different formats in which DHS survey data are available and how to import them correctly into R. DHS datasets are typically distributed in formats like .dta (Stata), .sav (SPSS), .sas7bdat (SAS), and .flat (ASCII text). In SNT workflows, we usually work with rectangular recode files in Stata, SPSS, or SAS formats, as these follow a tidy structure (one row per observation and one column per variable) which makes them straightforward to handle in R.

The haven package provides functions to import these formats while preserving labels and handling missing values consistently. Depending on which format you have downloaded, use only the corresponding import function below to read the dataset into R.

# read Stata file (e.g., Household Recode)
hr_data <- haven::read_dta(
  here::here("1.6_health_systems/1.6a_dhs/raw/SLHR7AFL/SLHR7AFL.DTA")
)

# read SPSS file
hr_data <- haven::read_sav(
  here::here("1.6_health_systems/1.6a_dhs/raw/SLHR7AFL/SLHR7AFL.SAV")
)

# read SAS file
hr_data <- haven::read_sas(
  here::here("1.6_health_systems/1.6a_dhs/raw/SLHR7AFL/SLHR7AFL.SAS7BDAT")
)

To adapt the code:

Function to use: Select relevant approach based on your file format.
File path: Adjust the path and filename as needed.

Step 2b.3: Importing DHS geographic (GPS) data

Geographic data from DHS are typically provided in .DBF and .MDB formats. These contain information such as cluster location (latitude, longitude, altitude), administrative units, urban/rural classification, GPS accuracy notes (e.g., whether coordinates were collected via GPS or approximated) and Geographic coordinate system (usually WGS84).

Most SNT workflows use .DBF files as they integrate directly with sf. If only .MDB is available, the mdbr package offers a lightweight solution that doesn’t require proprietary drivers.

# read .DBF shapefile
gps_data <- sf::st_read(
    here::here("1.6_health_systems/1.6a_dhs/raw/SLGE7AFL/SLGE7AFL.DBF")
    )

# read .MDB shapefile
gps_data <- mdbr::read_mdb(
    here::here("1.6_health_systems/1.6a_dhs/raw/SLGE7AFL/SLGE7AFL.MDB"),
    table = "GEOGRAPHIC")

To adapt the code:

Function to use: Select relevant approach based on your file format.
File path: Adjust the path and filename as needed.
table name in mdbr: Use mdbr::mdb_tables(") if you’re unsure of the table name inside the MDB file.

Summary

DHS and MIS datasets provide an essential foundation for understanding population-level malaria indicators in SNT workflows. Each dataset—such as household (HR), individual (PR) and child (KR) - captures a layer of information, and they are designed to be used together. When properly linked and weighted, they allow analysts to produce reliable, representative estimates for subnational decision-making.

This page introduced three main approaches for working with DHS data and walked through a step-by-step example using the rdhs package to download the 2019 DHS and 2019 MIS datasets for Sierra Leone. These datasets will be used throughout the indicator-specific pages that follow, covering outcomes like ITN use, PfPR, treatment-seeking, and mortality.

Full Code

Find the full code script for DHS data overview and preparation below.

Overview

Understanding and Preparing DHS Data for SNT Analysis

Background on Household Survey Data

Accessing Survey Data

Required Files and Variables

Working with Survey Weights

Step-by-Step

Option 1: Access DHS Data Indicators Directly

Step 1.0: Inspect STATcompiler (DHS API Dashboard)

Step 1.1: Install and load required packages

Step 1.2: Search for available indicators

Step 1.3: Download indicators at subnational level

Step 1.4: (If needed) Using custom shapefiles when DHS/MIS boundaries are unavailable or not preferred

Option 2a: Download DHS Data Via rdhs

Step 2a.1: Set up

Step 2a.2: Download DHS data

Step 2a.3: Download DHS data

Option 2b: Access Data Via SNT Team or Institutional Partners

Step 2b.1: Set up

Step 2b.2: Importing DHS data

Step 2b.3: Importing DHS geographic (GPS) data

Summary

Full Code

Option 2a: Download DHS Data Via `rdhs`