Prevalence of malaria infection

Beginner

Overview

Parasite prevalence, the proportion of a population that is infected with malaria parasites, is a measure of malaria morbidity. In SNT, parasite prevalence can be used to understand historical trends, for epidemiological stratification, and to identify eligible and priority areas for interventions.

Prevalence is measured in cross-sectional national household surveys such as the Malaria Indicator Surveys (MIS), and may also be measured in Demographic and Health Surveys (DHS) or Multiple Indicator Cluster Surveys (MICS). These surveys measure prevalence in children, usually under 5 years of age, but sometimes also those 5-9 years, by testing the sampled children for malaria infection via microscopy or rapid diagnostic test (RDT).

This page is focused on extracting prevalence of malaria infection in children as reported in MIS/DHS surveys.

Objectives

Access household survey data on prevalence of malaria infection at the adm1 level
Inspect, visualize, and validate prevalence indicators
Compile and save clean, subnational summaries of prevalence indicators for integration into SNT workflows

Data Needs

Core Indicators for Prevalence

Malaria prevalence can be measured using various diagnostic methods, of which up to two are used in household surveys: rapid diagnostic tests (RDT) and microscopy. These methods are applied to children sampled in the household survey, typically those under five years of age. Each diagnostic method yields a slightly different estimate of prevalence due to differences in what they detect, how long those signals remain detectable in the bloodstream, the sensitivity and specificity of the test itself, and the ability of the human reader to interpret the test result accurately.

Prevalence by RDT: The proportion of children who tested positive for malaria using a rapid diagnostic test. These tests detect malaria antigens (such as HRP2/3 or LDH), which can remain in the bloodstream for several weeks after treatment. To calculate RDT-based prevalence in DHS/MIS data, the numerator is the number of children with a positive RDT result (hml35 == 1), and the denominator is the number of children with a valid RDT result (hml35 %in% c(0,1)).
Prevalence by microscopy: The proportion of children who tested positive for malaria parasites through blood smear microscopy. This method is more specific to current infection, as it detects malaria-causing parasites in the blood, but requires a trained microscopist. To calculate microscopy-based prevalence in DHS/MIS data, the numerator is the number of children with a positive microscopy result (hml32 == 1), and the denominator is the number of children with a valid microscopy result (hml32 %in% c(0,1))

(View Statistics Guide).

Interpreting and Using Survey Data on Prevalence

The microscopy vs RDT based prevalence estimates provide slightly different estimates. The reason being that the antigens detected by RDTs can remain in the bloodstream for several weeks after an infection is cleared by effective antimalarials, or naturally cleared. Generally prevalence measured by RDT is higher than prevalence measured by microscopy.

If both prevalence indicators are available, each should be summarized at the subnational level and reviewed side by side. Differences between them should be interpreted in light of what each test measures, as well as any contextual knowledge on the quality of implementation, such as the availability of trained microscopists. Generally the SNT team will have insight into whether there were particular difficulties encountered during the survey, either with sampling, laboratory testing, and/or data management, that should influence the interpretation of survey results. The SNT team should be presented with both indicators and decide which version is most suitable for decision-making.

Limitations to Keep in Mind When Interpreting Survey Data

While DHS and MIS provide standardized, population-based estimates of malaria prevalence, there are several limitations analysts should consider when interpreting the data.

Sparse geographical survey coverage can introduce bias. If survey cluster coordinates are not accessible or incomplete, it becomes difficult to assess whether certain areas (such as zones affected by conflict or inaccessible terrain) were systematically excluded. Certain populations, such as those living in displaced persons camps, may also be undersampled. Insufficient geographical representation may skew both ownership and usage estimates.

District-level estimates can also be imprecise due to small sample sizes. In most countries, household surveys are not powered to produce mortality estimates at adm-2. Confidence intervals can be wide, and results in less-populated or harder-to-reach districts should be viewed as indicative, not definitive. Even when statistical techniques are applied, they can only partially account for the underlying uncertainty.

The timing of the survey matters. Malaria prevalence can fluctuate seasonally, with higher prevalence after rainfall season. Surveys conducted outside of the peak transmission season may underestimate the prevalence.

These limitations don’t invalidate the data, but they should be kept in mind when using ITN indicators for situational analysis or strategic planning in SNT.

Step-by-Step

Several options on how to access and use DHS data are presented on the DHS Data Overview and Preparation page. Please refer there for more details.

Here, we will apply Option 1 (pull pre-built indicators from the DHS API) and Option 2 (build from raw survey data). While Option 1 is the quickest approach, it has limitations—only pre-built indicators are available, with no customozation possible for aggregation or population stratification. To address this, Option 2, describes a fully flexible approach by using the rdhs package to access raw survey data and construct indicators via R.

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Option 1 (Download Pre-Calculated Prevalence)

This approach retrieves pre-built malaria prevalence indicators from the DHS API at the admin1-level. It provides a fast way to retrieve prevalence estimates based on either rapid diagnostic test (RDT) or microscopy, disaggregated by region or residence. However, users cannot adjust the age bands or diagnostic method beyond what DHS has published in their reports or on the DHS API (also accessible via STATCompiler).

Step 1.1.1: Install or load required packages

In this step, we download the prevalence indicators using the specified DHS Indicator IDs. We then join these indicators with our shapefile of interest (either admin-1 or admin-2), which determines the geographic level of analysis. This merged dataset forms the basis for plotting results and conducting subnational analysis.

We start by loading the required packages and setting up for the full section. For Option 1, we use two custom functions: check_dhs_indicators() and download_dhs_indicators().

Show full set-up code

# install or load required packages
pacman::p_load(
  here,       # for handling relative file paths
  haven,      # for reading DHS .dta files and labelled data
  dplyr,      # for data wrangling
  stringr,    # for filtering U5MR rows using regex
  tibble,     # for rownames_to_column
  ggplot2,    # for visualizing U5MR maps
  sf,         # for working with shapefiles
  rio,        # for saving outputs in .csv and .rds formats
  survey,     # for complex survey analysis
  cowplot,    # for arranging plots
  DHS.rates   # for calculating under-five mortality rates
)

#' Check DHS Indicator List from API
#'
#' @param countryIds DHS country code(s), e.g., "EG"
#' @param indicatorIds specific indicator ID(s)
#' @param surveyIds survey ID(s)
#' @param surveyYear exact year
#' @param surveyYearStart start of year range
#' @param surveyYearEnd end of year range
#' @param surveyType DHS survey type (e.g., "DHS", "MIS")
#' @param surveyCharacteristicIds filter by survey characteristic ID
#' @param tagIds filter by tag ID
#' @param returnFields Fields to return
#'   (default: IndicatorId, Label, Definition)
#' @param perPage max results per page (default = 500)
#' @param page specific page to return (default = 1)
#' @param f format (default = "json")
#'
#' @return a data.frame of indicators
#' @export
check_dhs_indicators <- function(
  countryIds = NULL,
  indicatorIds = NULL,
  surveyIds = NULL,
  surveyYear = NULL,
  surveyYearStart = NULL,
  surveyYearEnd = NULL,
  surveyType = NULL,
  surveyCharacteristicIds = NULL,
  tagIds = NULL,
  returnFields = c(
    "IndicatorId", "Label", "Definition", "MeasurementType"
  ),
  perPage = NULL,
  page = NULL,
  f = "json"
) {
  # base URL
  base_url <-
    "https://api.dhsprogram.com/rest/dhs/indicators?"

  # build query string
  params <- list(
    countryIds = countryIds,
    indicatorIds = indicatorIds,
    surveyIds = surveyIds,
    surveyYear = surveyYear,
    surveyYearStart = surveyYearStart,
    surveyYearEnd = surveyYearEnd,
    surveyType = surveyType,
    surveyCharacteristicIds = surveyCharacteristicIds,
    tagIds = tagIds,
    returnFields = paste(returnFields, collapse = ","),
    perPage = perPage,
    page = page,
    f = f
  )

  # drop NULLs and encode
  query <- paste(
    purrr::compact(params) |>
      purrr::imap_chr(
        ~ paste0(
          .y, "=", URLencode(as.character(.x), reserved = TRUE)
        )
      ),
    collapse = "&"
  )

  # full URL
  full_url <- paste0(base_url, query)

  # fetch with progress bar
  response <- httr::GET(full_url, httr::progress())
  jsonlite::fromJSON(httr::content(
    response,
    as = "text",
    encoding = "UTF-8"
  ))$Data
}

#' Query DHS API Directly via URL Parameters
#'
#' builds and queries DHS API for indicator data using URL-based access
#' instead of rdhs package.
#'
#' @param countryIds Comma-separated DHS country code(s), e.g., "SL"
#' @param indicatorIds Comma-separated DHS indicator ID(s),
#'   e.g., "CM_ECMR_C_U5M"
#' @param surveyIds optional comma-separated survey ID(s), e.g.,
#' "SL2016DHS"
#' @param surveyYear optional exact survey year, e.g., "2016"
#' @param surveyYearStart optional survey year range start
#' @param surveyYearEnd optional survey year range end
#' @param breakdown one of: "national", "subnational", "background",
#' "all"
#' @param f format to return (default is "json")
#'
#' @return a data.frame containing the `Data` portion of the API
#' response.
#' @export
download_dhs_indicators <- function(
  countryIds,
  indicatorIds,
  surveyIds = NULL,
  surveyYear = NULL,
  surveyYearStart = NULL,
  surveyYearEnd = NULL,
  breakdown = "subnational",
  f = "json"
) {
  # base URL
  base_url <-
    "https://api.dhsprogram.com/rest/dhs/data?"

  # assemble query string
  query <- paste0(
    "breakdown=",
    breakdown,
    "&indicatorIds=",
    indicatorIds,
    "&countryIds=",
    countryIds,
    if (!is.null(surveyIds))
      paste0("&surveyIds=", surveyIds),
    if (!is.null(surveyYear))
      paste0("&surveyYear=", surveyYear),
    if (!is.null(surveyYearStart))
      paste0("&surveyYearStart=", surveyYearStart),
    if (!is.null(surveyYearEnd))
      paste0("&surveyYearEnd=", surveyYearEnd),
    "&lang=en&f=",
    f
  )

  full_url <- paste0(base_url, query)

  cli::cli_alert_info("Downloading DHS data...")

  response <- httr::GET(full_url, httr::progress())

  if (httr::http_error(response)) {
    stop("API request failed: ", httr::status_code(response))
  }

  content_raw <- httr::content(
    response, as = "text", encoding = "UTF-8"
  )
  data <- jsonlite::fromJSON(content_raw)$Data

  cli::cli_alert_success(
    "Download complete: {nrow(data)} records retrieved."
  )
  return(data)
}

Step 1.1.2: Download the relevant indicators

As shown in Step 1.2 of Option 1 in the DHS Data Overview and Preparation page, we use the check_dhs_indicators() function to explore available indicators and ensure that we are pulling the correct DHS indicator IDs for the 2016 Sierra Leone MIS. In this section, we focus on parasite prevalence among children and its associated confidence intervals, using both RDT and microscopy-based estimates.

The relevant indicator IDs are:

ML_PMAL_C_RDT: Malaria prevalence according to RDT
ML_PMAL_C_RDL: Malaria prevalence according to RDT—CI lower bound (-2SE)
ML_PMAL_C_RDU: Malaria prevalence according to RDT—CI upper bound (+2SE)
ML_PMAL_C_MSY: Malaria prevalence according to microscopy
ML_PMAL_C_MSL: Malaria prevalence according to microscopy—CI lower bound (-2SE)
ML_PMAL_C_MSU: Malaria prevalence according to microscopy—CI upper bound (+2SE)

Show full code

# RDT-based prevalence
rdt_mean <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_RDT",
  breakdown = "subnational"
) |>
  dplyr::mutate(
    dhs_indicator_id = "ML_PMAL_C_RDT",
    indicator = "Malaria prevalence (RDT)"
  ) |>
  dplyr::select(
    dhs_indicator_id,
    indicator,
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    mean_prev = Value
  )

rdt_low <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_RDL",
  breakdown = "subnational"
) |>
  dplyr::select(
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    low_prev = Value
  )

rdt_upp <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_RDU",
  breakdown = "subnational"
) |>
  dplyr::select(
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    upp_prev = Value
  )

# microscopy-based prevalence
mic_mean <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_MSY",
  breakdown = "subnational"
) |>
  dplyr::mutate(
    dhs_indicator_id = "ML_PMAL_C_MSY",
    indicator = "Malaria prevalence (Microscopy)"
  ) |>
  dplyr::select(
    dhs_indicator_id,
    indicator,
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    mean_prev = Value
  )

mic_low <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_MSL",
  breakdown = "subnational"
) |>
  dplyr::select(
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    low_prev = Value
  )

mic_upp <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_MSU",
  breakdown = "subnational"
) |>
  dplyr::select(
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    upp_prev = Value
  )

To adapt the code:

Lines 3, 21, 33, 46, 64, 76: Replace "SL" with your country’s DHS code (e.g., "NG" for Nigeria, "KE" for Kenya)
Lines 4, 22, 34, 47, 65, 77: Update the surveyYear if you’re using a different round (e.g., 2019)
Lines 5, 23, 35, 48, 66, 78: Ensure the correct Indicator ID is used as listed above
Lines 6, 24, 36, 49, 67, 79: Keep breakdown = "subnational" to get regional estimates (typically admin-1); use "national" if only national values are needed

Step 1.1.3a: Join prevalence indicators when a dedicated shapefile IS available

Now that the data is ready, we need to join it to a shapefile so we can visualize it on a map. In cases where a dedicated shapefile is available that aligns with the administrative boundaries used in the survey, the join process is simple. You can use the region code (e.g., adm2_code) directly, avoiding manual name harmonization done in Step 1.1.3b. If there is no survey-associated shapefile or you would like to use a specific non-survey shapefile, skip this Step and proceed directly to *Step 1.1.3b.

In reality, the 2016 Sierra Leone MIS does not have a matching shapefile (hence the approach in Step 1.1.3b), so the code shown here is simply to demonstrate how the process would work when region codes in the indicator data match those in the shapefile. See the ITN Metrics and Mortality Data pages for real cases where region code joins were used successfully.

Show full code

# get the DHS adm2 shapefile
sle_dhs_shp2 <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries",
    "1ai_adm2",
    "sdr_mis_subnational_boundaries_adm2.shp"
  )
) |>
  dplyr::select(
    adm1 = OTHREGNA,
    adm2 = DHSREGEN,
    adm2_code = REG_ID
  ) |>
  # make adm1 and adm2 to upper case
  dplyr::mutate(
    adm1 = toupper(adm1),
    adm2 = toupper(adm2)
  )

# join pfpr both indicators
pfpr_indicators <- dplyr::bind_rows(
  pfpr_indicator_rdt,
  pfpr_indicator_mic
)

# join adm2 dhs shapefile to data only keeping adm2 indicators
pfpr_indicators_final <- pfpr_indicators |>
  dplyr::inner_join(sle_dhs_shp2, by = "adm2_code") |>
  # select only relevant columns
  dplyr::select(
    dhs_indicator_id,
    indicator,
    adm1,
    adm2,
    mean_prev,
    low_prev,
    upp_prev,
    mean_prev,
    low_prev,
    upp_prev,
  ) |>
  sf::st_as_sf()

# check indicators
sf::st_drop_geometry(pfpr_indicators_final)

To adapt the code:

Shapefile loading (Lines 3–10): If working at the admin-1 level, load the corresponding adm1 shapefile instead (e.g., 1ai_adm1). In that case, do not rename or select adm2 columns—keep only adm1 and the appropriate join code
Lines 11–18: When using admin-1, you can skip any code referencing adm2 variables (e.g., adm2 = DHSREGEN or mutate(adm2 = ...)). Focus on cleaning and renaming just the admin-1 fields
Line 21: RDT and microscopy data contain results at both admin-1 and admin-2 levels. You do not need to filter them beforehand. Instead, your choice of shapefile and join key (e.g., adm2_code) will implicitly filter the indicator data to match only the relevant administrative level
Line 28: Use inner_join() to retain only rows that match your shapefile’s geographic level. For admin-1, this would be by = "adm1_code"; for admin-2, use by = "adm2_code" as shown

Step 1.1.3b: Join prevalence indicators when a dedicated shapefile is NOT available

For the 2016 Sierra Leone MIS, there is no dedicated shapefile available that corresponds directly to the administrative boundaries used in that survey. As a result, we will use the Sierra Leone 2019 DHS administrative boundaries from the DHS Spatial Data Repository as a reference and manually harmonize the region names provided in the indicator data to match the shapefile.

Note

If a shapefile exists for the specific DHS/MIS country-year (e.g., a matching GE file or custom shapefile from the survey implementer), you can use the code from Step 1.1.3a instead, which shows how to join using region codes instead of harmonized names. However, if you are working with a custom shapefile approved by the SNT team, you may need to do name harmonization as shown in Step 1.1.3b.

Please revisit Step 1.4 of DHS Data Overview and Preparation for more guidance on using custom shapefiles when DHS/MIS boundaries are unavailable or not the preferred option.

We use the region_label_updated column as our cleaned and harmonized geographic name field. This column serves as the join key when merging the prevalence indicators with the shapefile. If we are working at the admin-2 level, we join region_label_updated to the adm2 column in the shapefile. If instead we are working at the admin-1 level, we would harmonize region_label_updated to match adm1 and join using region_label_updated = adm1. Because we use an inner join, only rows that have a match in the shapefile are retained. This means that, for example, admin-1 summary rows in the indicator data will be excluded if we are performing an admin-2 level join. This ensures the resulting dataset contains only relevant, ‘mappable’ units.

Show full code

# load the DHS `admin-2` shapefile
sle_dhs_shp2 <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries",
    "1ai_adm2",
    "sdr_subnational_boundaries_adm2.shp"
  )
) |>
  dplyr::select(
    adm1 = OTHREGNA,
    adm2 = DHSREGEN,
    adm2_code = REG_ID
  ) |>
  # make adm1 and adm2 to upper case
  dplyr::mutate(
    adm1 = toupper(adm1),
    adm2 = toupper(adm2)
  )

# join rdt indicator with CIs
pfpr_indicator_rdt <- rdt_mean |>
  dplyr::left_join(rdt_low, by = "region_label") |>
  dplyr::left_join(rdt_upp, by = "region_label")

# join mic indicator with CIs
pfpr_indicator_mic <- mic_mean |>
  dplyr::left_join(mic_low, by = "region_label") |>
  dplyr::left_join(mic_upp, by = "region_label")

# combine RDT and microscopy indicators with CIs
pfpr_indicators <- dplyr::bind_rows(
  pfpr_indicator_rdt,
  pfpr_indicator_mic
) |>
  # manually clean up admin names
  dplyr::mutate(
    region_label_updated = dplyr::case_when(
      region_label == "..Kailahun" ~ "KAILAHUN",
      region_label == "..Kenema" ~ "KENEMA",
      region_label == "..Kono" ~ "KONO",
      region_label == "..Koinadugu (before 2017)" ~ "KOINADUGU",
      region_label == "..Tonkolili" ~ "TONKOLILI",
      region_label == "..Kambia" ~ "KAMBIA",
      region_label == "..Karene" ~ "KARENE",
      region_label == "..Bombali (before 2017)" ~ "BOMBALI",
      region_label == "..Port Loko (before 2017)" ~ "PORT LOKO",
      region_label == "..Bo" ~ "BO",
      region_label == "..Bonthe" ~ "BONTHE",
      region_label == "..Moyamba" ~ "MOYAMBA",
      region_label == "..Pujehun" ~ "PUJEHUN",
      region_label == "..Western Rural" ~ "WESTERN AREA RURAL",
      region_label == "..Western Urban" ~ "WESTERN AREA URBAN",
      region_label == "Eastern" ~ "EASTERN",
      region_label == "Northern (before 2017)" ~ "NORTHERN",
      region_label == "Western" ~ "WESTERN",
      region_label == "Southern" ~ "SOUTHERN",
      TRUE ~ NA
    )
  ) |>
  dplyr::filter(!is.na(region_label_updated))

# join shapefile to indicator data
pfpr_indicators_final <-
  pfpr_indicators |>
  dplyr::inner_join(sle_dhs_shp2, by = c("region_label_updated" = "adm2")) |>
  dplyr::select(
    dhs_indicator_id,
    indicator,
    adm1,
    adm2 = region_label_updated,
    mean_prev,
    low_prev,
    upp_prev,
    geometry
  ) |>
  sf::st_as_sf()

# preview the joined data (without geometry)
sf::st_drop_geometry(pfpr_indicators_final)

Output

   dhs_indicator_id                       indicator  adm1               adm2
1     ML_PMAL_C_RDT        Malaria prevalence (RDT)  EAST           KAILAHUN
2     ML_PMAL_C_RDT        Malaria prevalence (RDT)  EAST             KENEMA
3     ML_PMAL_C_RDT        Malaria prevalence (RDT)  EAST               KONO
4     ML_PMAL_C_RDT        Malaria prevalence (RDT) NORTH          KOINADUGU
5     ML_PMAL_C_RDT        Malaria prevalence (RDT) NORTH          TONKOLILI
6     ML_PMAL_C_RDT        Malaria prevalence (RDT) NORTH             KAMBIA
7     ML_PMAL_C_RDT        Malaria prevalence (RDT) NORTH            BOMBALI
8     ML_PMAL_C_RDT        Malaria prevalence (RDT) NORTH          PORT LOKO
9     ML_PMAL_C_RDT        Malaria prevalence (RDT) SOUTH                 BO
10    ML_PMAL_C_RDT        Malaria prevalence (RDT) SOUTH             BONTHE
11    ML_PMAL_C_RDT        Malaria prevalence (RDT) SOUTH            MOYAMBA
12    ML_PMAL_C_RDT        Malaria prevalence (RDT) SOUTH            PUJEHUN
13    ML_PMAL_C_RDT        Malaria prevalence (RDT)  WEST WESTERN AREA RURAL
14    ML_PMAL_C_RDT        Malaria prevalence (RDT)  WEST WESTERN AREA URBAN
15    ML_PMAL_C_MSY Malaria prevalence (Microscopy)  EAST           KAILAHUN
16    ML_PMAL_C_MSY Malaria prevalence (Microscopy)  EAST             KENEMA
17    ML_PMAL_C_MSY Malaria prevalence (Microscopy)  EAST               KONO
18    ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH          KOINADUGU
19    ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH          TONKOLILI
20    ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH             KAMBIA
21    ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH            BOMBALI
22    ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH          PORT LOKO
23    ML_PMAL_C_MSY Malaria prevalence (Microscopy) SOUTH                 BO
24    ML_PMAL_C_MSY Malaria prevalence (Microscopy) SOUTH             BONTHE
25    ML_PMAL_C_MSY Malaria prevalence (Microscopy) SOUTH            MOYAMBA
26    ML_PMAL_C_MSY Malaria prevalence (Microscopy) SOUTH            PUJEHUN
27    ML_PMAL_C_MSY Malaria prevalence (Microscopy)  WEST WESTERN AREA RURAL
28    ML_PMAL_C_MSY Malaria prevalence (Microscopy)  WEST WESTERN AREA URBAN
   mean_prev low_prev upp_prev
1       67.0     59.8     74.3
2       59.3     48.9     69.6
3       49.5     39.6     59.4
4       78.1     72.2     84.0
5       68.3     61.3     75.3
6       59.4     45.8     73.1
7       47.7     37.1     58.2
8       69.8     62.7     76.9
9       57.1     49.6     64.5
10      46.8     39.1     54.6
11      60.6     51.6     69.6
12      69.2     63.0     75.4
13      33.5     24.5     42.5
14       3.8      2.0      5.5
15      45.0     39.8     50.2
16      37.7     29.4     46.0
17      37.5     27.0     47.9
18      57.9     49.7     66.1
19      55.7     48.1     63.4
20      48.3     38.4     58.3
21      37.6     28.9     46.3
22      58.5     49.8     67.2
23      39.7     29.9     49.5
24      26.1     21.0     31.3
25      39.9     31.4     48.4
26      46.8     39.2     54.4
27      34.9     25.6     44.2
28       6.3      2.1     10.5

To adapt the code:

Shapefile loading (Lines 3–10): This example uses the admin-2 shapefile (1ai_adm2). If your indicators are reported at the admin-1 level instead, update the path to load 1ai_adm1 and modify the column selections to match admin-1 fields (e.g., adm1 = DHSREGNA, adm1_code = REG_ID), excluding any adm2 fields.
Lines 21–23, 26–28: RDT and microscopy indicators are joined with their respective confidence intervals using region_label as the key. Ensure these match the format used in your input data.
Lines 36–59: The region_label values from the MIS 2016 data are manually harmonized using case_when() to produce a cleaned field called region_label_updated. Only those regions with a match in the shapefile should be retained, so this step is essential to ensure join compatibility.
Line 65: The inner_join() ensures that only subnational units that exist in both the indicator dataset and shapefile are retained. Because we are using an admin-2 shapefile, any admin-1 summary rows (like "Eastern" or "Southern") will be dropped automatically during this step.
Lines 66–74: Adjust the select() statement to include only the columns relevant to your analysis. If you’re working with just RDT or just microscopy, you may omit fields related to the other. If working at a different administrative level, ensure the adm1 or adm2 assignments reflect your join key correctly.
General note: Always inspect the output of sf::st_drop_geometry() to confirm that the join worked as expected and that no key subnational units were lost in the process

Step 1: Option 2 (Calculate Prevalence from Raw DHS Data)

In this option, we will calculate prevalence ourselves from DHS files we have already downloaded, for example, via Option 2a or 2b on the DHS Overview and Data Preparation page.

Step 1.2.1: Load the relevant DHS data

# import the KR (children's) data
sle_dhs_pr <- readRDS(
  here::here("1.6_health_systems/1.6a_dhs/raw/SLPR73FL.rds")
) |>
  # make location column
  dplyr::mutate(
    adm1 = haven::as_factor(hv024) |>
      toupper(),
    adm2 = haven::as_factor(shdist) |>
      toupper()
  )

To adapt the code:

Line 3: Replace PR file path with your country’s DHS dataset (e.g., NGPR7AFL.rds for Nigeria)
Lines 6–8: Update the administrative variable mappings if your country uses different DHS variable codes for admin-1 (hv024) and admin-2 (shdist or sdist) regions

If using standard DHS naming, no other changes should be needed.

Step 1.2.2: Extract individual records

In this step, we identify individual-level records for children aged 6–59 months who were eligible for malaria testing during the survey. Using the PR file, we construct binary indicators for whether a child was tested for malaria via RDT or microscopy, as well as whether they tested positive. These indicators are derived using DHS standard variables and eligibility criteria: the child must have slept in the household the previous night (hv103), be aged 6–59 months (hc1), and have a valid test result (hml35 for RDT and hml32 for microscopy).

Show full code

# extract pfpr from individual records
sle_dhs_pr2 <- sle_dhs_pr |>
  dplyr::mutate(
    # tested for malaria via RDT
    tested_rdt = dplyr::case_when(
      # child was present last night (hv103 == 1),
      # has a mother in the household (hv042 == 1),
      # is between 6–59 months (hc1),
      # and has a valid RDT result (0 = negative, 1 = positive)
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml35 %in% c(0, 1) ~ 1,
      # child meets criteria but no valid RDT result recorded
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 ~ 0,
      TRUE ~ NA_real_ # all others set to missing
    ),

    # microscopy testing status
    tested_mic = dplyr::case_when(
      # child was present last night (hv103 == 1),
      # has a mother in the household (hv042 == 1),
      # is aged 6–59 months (hc1),
      # and has a microscopy result: 0 = negative,
      # 1 = positive, 6 = invalid (still considered tested)
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml32 %in% c(0, 1, 6) ~
        1,
      # child is eligible but no valid microscopy result available
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 ~ 0,
      TRUE ~ NA_real_ # all others set to missing
    ),

    # RDT positive
    rdt_pos = dplyr::case_when(
      # positive RDT result (hml35 == 1) for eligible child
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml35 == 1 ~ 1,
      # negative RDT result (hml35 == 0) for eligible child
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml35 == 0 ~ 0,
      TRUE ~ NA_real_ # all others set to missing
    ),

    # microscopy positive
    mic_pos = dplyr::case_when(
      # positive microscopy result (hml32 == 1) for eligible child
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml32 == 1 ~ 1,
      # negative or invalid microscopy result (0 or 6)
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml32 %in% c(0, 6) ~ 0,
      TRUE ~ NA_real_ # all others set to missing
    ),
    survey_weight = hv005 / 1000000
  ) |>
  dplyr::select(
    cluster_id = hv021,
    stratum_id = hv022,
    survey_weight,
    adm1,
    adm2,
    tested_rdt,
    tested_mic,
    rdt_pos,
    mic_pos
  )

To adapt the code:

Modify only if relevant parameters were not created exactly as in earlier steps.

Step 1.2.3: Calculate malaria prevalence (RDT and Microscopy) at district level

In this step, we calculate district-level malaria prevalence using survey-weighted estimates, restricted to children who were actually tested. We create two separate survey design objects: one for children tested via RDT, and another for those tested via microscopy. For each design, we apply svyby() to estimate the proportion of children who tested positive (rdt_pos and mic_pos), grouped by adm1 and adm2. Confidence intervals are included using the ci option in svyby(). The final output is a table showing the weighted prevalence estimates and their corresponding lower and upper bounds by district.

Show full code

# RDT design
des_rdt <- survey::svydesign(
  ids = ~cluster_id,
  strata = ~stratum_id,
  weights = ~survey_weight,
  data = dplyr::filter(sle_dhs_pr2, tested_rdt == 1),
  nest = TRUE
)

# microscopy design
des_mic <- survey::svydesign(
  ids = ~cluster_id,
  strata = ~stratum_id,
  weights = ~survey_weight,
  data = dplyr::filter(sle_dhs_pr2, tested_mic == 1),
  nest = TRUE
)

# RDT prevalence by adm2
prev_rdt <- survey::svyby(
  ~rdt_pos,
  ~ adm1 + adm2,
  design = des_rdt,
  FUN = survey::svymean,
  vartype = c("ci"), # add CI
  keep.names = F
) |>
  dplyr::mutate(
    pct = round(rdt_pos * 100, 1),
    ci_low = round(ci_l * 100, 1),
    ci_upp = round(ci_u * 100, 1)
  ) |>
  dplyr::arrange(adm1) |>
  dplyr::mutate(indicator = "Malaria prevalence (RDT)") |>
  dplyr::select(adm2, indicator, pct, ci_low, ci_upp)

# microscopy prevalence by adm2
prev_mic <- survey::svyby(
  ~mic_pos,
  ~ adm1 + adm2,
  design = des_mic,
  FUN = survey::svymean,
  vartype = c("ci"), # add CI
  keep.names = F
) |>
  dplyr::mutate(
    pct = round(mic_pos * 100, 1),
    ci_low = round(ci_l * 100, 1),
    ci_upp = round(ci_u * 100, 1)
  ) |>
  dplyr::arrange(adm1) |>
  dplyr::mutate(indicator = "Malaria prevalence (Microscopy)") |>
  dplyr::select(adm2, indicator, pct, ci_low, ci_upp)

# bind both together
prev_final <- dplyr::bind_rows(prev_rdt, prev_mic)

To adapt the code:

Modify only if relevant parameters were not created exactly as in earlier steps.

Step 1.2.4: Join prevalence indicators to a shapefile

Now that we have a final indicator dataset, we can join it with the MIS shapefile using adm2 as keys. Before doing so, we ensure that admin names are harmonized across both datasets.

For example, in the MIS data, adm2 are labeled as WEST AREA RURAL and WEST AREA URBAN, whereas the shapefile uses WESTERN AREA RURAL and WESTERN AREA URBAN. To ensure a successful join, we clean and recode these names accordingly before merging. Your case may differ—always inspect and harmonize naming between datasets to avoid mismatches during joins

When Name-Based Joins Get Tricky…

When merging by names becomes complex or non-trivial (like in the example below), visit the section on Merging Spatial Vectors (Shapefile) with Tabular data for best practices and detailed guidance.

Show the code

# load the DHS `admin-2` shapefile
sle_dhs_shp2 <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries",
    "1ai_adm2",
    "sdr_subnational_boundaries_adm2.shp"
  )
) |>
  dplyr::select(
    adm1 = OTHREGNA,
    adm2 = DHSREGEN,
    adm2_code = REG_ID
  ) |>
  # make adm1 and adm2 to upper case
  dplyr::mutate(
    adm1 = toupper(adm1),
    adm2 = toupper(adm2)
  )

# bind both together
prev_final <- prev_final |>
  # clean up adm2 names to match shapefile
  dplyr::mutate(
    adm2 = dplyr::case_when(
      adm2 == "WEST AREA RURAL" ~ "WESTERN AREA RURAL",
      adm2 == "WEST AREA URBAN" ~ "WESTERN AREA URBAN",
      TRUE ~ adm2
    )
  ) |>
  # join to shapefile
  dplyr::left_join(sle_dhs_shp2, by = c("adm2")) |>
  sf::st_as_sf()

sf::st_drop_geometry(prev_final)

Output

                 adm2                       indicator  pct ci_low ci_upp  adm1
1            KAILAHUN        Malaria prevalence (RDT) 67.0   60.4   73.6  EAST
2              KENEMA        Malaria prevalence (RDT) 59.3   49.9   68.6  EAST
3                KONO        Malaria prevalence (RDT) 49.5   40.3   58.7  EAST
4             BOMBALI        Malaria prevalence (RDT) 47.7   39.7   55.6 NORTH
5              KAMBIA        Malaria prevalence (RDT) 59.4   46.4   72.5 NORTH
6           KOINADUGU        Malaria prevalence (RDT) 78.1   72.4   83.9 NORTH
7           PORT LOKO        Malaria prevalence (RDT) 69.8   63.4   76.2 NORTH
8           TONKOLILI        Malaria prevalence (RDT) 68.3   63.0   73.6 NORTH
9                  BO        Malaria prevalence (RDT) 57.1   52.8   61.3 SOUTH
10             BONTHE        Malaria prevalence (RDT) 46.8   39.3   54.4 SOUTH
11            MOYAMBA        Malaria prevalence (RDT) 60.6   51.8   69.4 SOUTH
12            PUJEHUN        Malaria prevalence (RDT) 69.2   63.2   75.1 SOUTH
13 WESTERN AREA RURAL        Malaria prevalence (RDT) 33.5   25.1   41.9  WEST
14 WESTERN AREA URBAN        Malaria prevalence (RDT)  3.8    2.2    5.4  WEST
15           KAILAHUN Malaria prevalence (Microscopy) 45.0   39.9   50.1  EAST
16             KENEMA Malaria prevalence (Microscopy) 37.7   30.6   44.8  EAST
17               KONO Malaria prevalence (Microscopy) 37.5   27.1   47.8  EAST
18            BOMBALI Malaria prevalence (Microscopy) 37.6   31.5   43.7 NORTH
19             KAMBIA Malaria prevalence (Microscopy) 48.3   38.7   58.0 NORTH
20          KOINADUGU Malaria prevalence (Microscopy) 57.9   49.9   65.8 NORTH
21          PORT LOKO Malaria prevalence (Microscopy) 58.5   51.0   66.1 NORTH
22          TONKOLILI Malaria prevalence (Microscopy) 55.7   50.1   61.4 NORTH
23                 BO Malaria prevalence (Microscopy) 39.7   32.3   47.1 SOUTH
24             BONTHE Malaria prevalence (Microscopy) 26.1   21.1   31.2 SOUTH
25            MOYAMBA Malaria prevalence (Microscopy) 39.9   31.6   48.2 SOUTH
26            PUJEHUN Malaria prevalence (Microscopy) 46.8   39.5   54.1 SOUTH
27 WESTERN AREA RURAL Malaria prevalence (Microscopy) 34.9   26.3   43.4  WEST
28 WESTERN AREA URBAN Malaria prevalence (Microscopy)  6.3    2.4   10.2  WEST

To adapt the code:

Shapefile loading (Lines 3–10): This example uses the admin-2 shapefile (1ai_adm2). If your indicators are reported at the admin-1 level instead, update the path to load 1ai_adm1 and modify the column selections to match admin-1 fields (e.g., adm1 = DHSREGNA, adm1_code = REG_ID), excluding any adm2 fields
Clean up administrative names in your indicator dataset to match the shapefile exactly. Adjust the logic in case_when() based on your admin naming conventions

It is important to note that in this example, we merge using adm2 alone. However, in cases where admin-2 names are not unique (e.g., the same name appears under different admin-1 regions), it is safer to merge using both adm1 and adm2, assuming both fields have been cleaned and standardized. This helps ensure accurate joins and prevent unintended mismatches.

Step 2: Visualize Subnational Indicator Data

Once the malaria prevalence estimates (from both RDT and microscopy) have been calculated and joined with a shapefile, we can create maps to explore geographic variation in parasite prevalence. In the example below, we categorize the prevalence values into 10% bins and visualize them using ggplot2.

Show full code

# categorize prevalence percentages
prev_final <- prev_final |>
  dplyr::mutate(
    prev_cat = cut(
      pct,
      breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
      labels = c(
        "<10",
        "10 to <20",
        "20 to <30",
        "30 to <40",
        "40 to <50",
        "50 to <60",
        "60 to <70",
        "70 to <80",
        "80 to <90",
        "90 to 100"
      ),
      include.lowest = TRUE,
      right = FALSE
    )
  )

# visualize RDT-based prevalence
map_rdt <- prev_final |>
  dplyr::filter(indicator == "Malaria prevalence (RDT)") |>
  ggplot2::ggplot() +
  ggplot2::geom_sf(
    ggplot2::aes(fill = prev_cat),
    color = "black",
    size = 2,
    show.legend = TRUE
  ) +
  ggplot2::scale_fill_manual(
    name = "RDT Prevalence (%)",
    values = c(
      "<10" = "#9E0142",
      "10 to <20" = "#D53E4F",
      "20 to <30" = "#F46D43",
      "30 to <40" = "#FDAE61",
      "40 to <50" = "#FEE08B",
      "50 to <60" = "#E6F598",
      "60 to <70" = "#ABDDA4",
      "70 to <80" = "#66C2A5",
      "80 to <90" = "#3288BD",
      "90 to 100" = "#5E4FA2"
    ),
    drop = FALSE
  ) +
  ggplot2::labs(
    title = "Malaria Prevalence by RDT (Admin-1)",
    caption = "Source: 2016 Sierra Leone MIS"
  ) +
  ggplot2::theme_void()

# microscopy map
map_mic <- prev_final |>
  dplyr::filter(indicator != "Malaria prevalence (RDT)") |>
  ggplot2::ggplot() +
  ggplot2::geom_sf(
    ggplot2::aes(fill = prev_cat),
    color = "black",
    size = 2,
    show.legend = TRUE
  ) +
  ggplot2::scale_fill_manual(
    name = "Microscopy Prevalence (%)",
    values = c(
      "<10" = "#9E0142",
      "10 to <20" = "#D53E4F",
      "20 to <30" = "#F46D43",
      "30 to <40" = "#FDAE61",
      "40 to <50" = "#FEE08B",
      "50 to <60" = "#E6F598",
      "60 to <70" = "#ABDDA4",
      "70 to <80" = "#66C2A5",
      "80 to <90" = "#3288BD",
      "90 to 100" = "#5E4FA2"
    ),
    drop = FALSE
  ) +
  ggplot2::labs(
    title = "Malaria Prevalence by Microscopy (Admin-1)",
    caption = "Source: 2016 Sierra Leone MIS"
  ) +
  ggplot2::theme_void()

# display maps side-by-side with shared legend
cowplot::plot_grid(
  map_rdt + ggplot2::theme(legend.position = "none"),
  map_mic + ggplot2::theme(legend.position = "none"),
  cowplot::get_legend(map_rdt),
  ncol = 3,
  rel_widths = c(1, 1, 0.3)
)

Output

To adapt the code:

This plotting example uses variable names from Option 2. If you are using the code from Option 1: Line 2: replace prev_final with pfpr_indicators_final; Line 5: replace pct with mean_prev.
Lines 4–17 (Binning prev_cat): Adjust cut() breaks as needed, for instance use meaningful custom categorization (prevalence endemicity levels), use quantiles, or continuous scales (scale_fill_viridis_c()) if preferred.
Indicator column: This example uses a single mean_prev column for both RDT and microscopy. Update if your data uses separate columns.
Lines 26, 59 (Filtering): Modify filter(indicator == ...) as needed to match your indicator labels.
Lines 51–53, 83–85 (Titles and captions): Update labs() with the correct country, year, survey type, and admin level.
Shapefile level: This uses admin-2. If using admin-1, adjust your shapefile and join key accordingly.
Lines 90–95 (Legend): The shared legend uses cowplot. Skip this if mapping only one indicator.

Now, let us save these plots for future reference.

# save RDT map
ggplot2::ggsave(
  plot = map_rdt,
  filename = here::here(
    "03_output/3a_figures/malaria_prev_rdt_adm2.png"
  ),
  width = 12,
  height = 9,
  dpi = 300
)

# save microscopy map
ggplot2::ggsave(
  plot = map_mic,
  filename = here::here(
    "03_output/3a_figures/malaria_prev_mic_adm2.png"
  ),
  width = 12,
  height = 9,
  dpi = 300
)

To adapt the code:

Lines 4, 15: Change the output path and filenames for your plots as needed.

Step 3: Save Processed Malaria Prevalence Indicators

Save the extracted prevalence data for future usage and reference. These outputs will be used in subsequent steps for stratification, visualization, and integration with other indicators.

# define save directory
save_path <- here::here("1.6_health_systems/1.6a_dhs")

# save final joined malaria prevalence data
rio::export(
  prev_final |> sf::st_drop_geometry(),
  here::here(
    save_path, "processed", "final_malaria_prev_sle_adm2.csv"
  )
)

# save final joined malaria prevalence data
rio::export(
  prev_final,
  here::here(
    save_path, "processed", "final_malaria_prev_sle_adm2.rds"
  )
)

To adapt the code:

Lines 2, 7, 13: Change the output path and filenames for your saved data as needed.

Summary

This section outlines two methods for estimating malaria prevalence among children aged 6–59 months. Option 1 applies existing DHS-coded variables that follow standard definitions for RDT and microscopy results, offering a quick and reliable way to generate prevalence estimates aligned with DHS reporting. Option 2 builds the same indicators manually from the raw data, allowing for clearer visibility into inclusion criteria and the flexibility to modify assumptions as needed. Using both approaches strengthens reproducibility, enables customization, and helps validate consistency across indicator definitions.

Full code

Find the full code script for extracting prevalence of malaria infection from DHS/MIS data below.

Show full code

################################################################################
################ ~ Prevalence of malaria infection full code ~ #################
################################################################################

### Step 1: Option 1 (Download Pre-Calculated Prevalence) ----------------------

#### Step 1.1.1: Install or load required packages -----------------------------

# install or load required packages
pacman::p_load(
  here,       # for handling relative file paths
  haven,      # for reading DHS .dta files and labelled data
  dplyr,      # for data wrangling
  stringr,    # for filtering U5MR rows using regex
  tibble,     # for rownames_to_column
  ggplot2,    # for visualizing U5MR maps
  sf,         # for working with shapefiles
  rio,        # for saving outputs in .csv and .rds formats
  survey,     # for complex survey analysis
  cowplot,    # for arranging plots
  DHS.rates   # for calculating under-five mortality rates
)

#' Check DHS Indicator List from API
#'
#' @param countryIds DHS country code(s), e.g., "EG"
#' @param indicatorIds specific indicator ID(s)
#' @param surveyIds survey ID(s)
#' @param surveyYear exact year
#' @param surveyYearStart start of year range
#' @param surveyYearEnd end of year range
#' @param surveyType DHS survey type (e.g., "DHS", "MIS")
#' @param surveyCharacteristicIds filter by survey characteristic ID
#' @param tagIds filter by tag ID
#' @param returnFields Fields to return
#'   (default: IndicatorId, Label, Definition)
#' @param perPage max results per page (default = 500)
#' @param page specific page to return (default = 1)
#' @param f format (default = "json")
#'
#' @return a data.frame of indicators
#' @export
check_dhs_indicators <- function(
  countryIds = NULL,
  indicatorIds = NULL,
  surveyIds = NULL,
  surveyYear = NULL,
  surveyYearStart = NULL,
  surveyYearEnd = NULL,
  surveyType = NULL,
  surveyCharacteristicIds = NULL,
  tagIds = NULL,
  returnFields = c(
    "IndicatorId", "Label", "Definition", "MeasurementType"
  ),
  perPage = NULL,
  page = NULL,
  f = "json"
) {
  # base URL
  base_url <-
    "https://api.dhsprogram.com/rest/dhs/indicators?"

  # build query string
  params <- list(
    countryIds = countryIds,
    indicatorIds = indicatorIds,
    surveyIds = surveyIds,
    surveyYear = surveyYear,
    surveyYearStart = surveyYearStart,
    surveyYearEnd = surveyYearEnd,
    surveyType = surveyType,
    surveyCharacteristicIds = surveyCharacteristicIds,
    tagIds = tagIds,
    returnFields = paste(returnFields, collapse = ","),
    perPage = perPage,
    page = page,
    f = f
  )

  # drop NULLs and encode
  query <- paste(
    purrr::compact(params) |>
      purrr::imap_chr(
        ~ paste0(
          .y, "=", URLencode(as.character(.x), reserved = TRUE)
        )
      ),
    collapse = "&"
  )

  # full URL
  full_url <- paste0(base_url, query)

  # fetch with progress bar
  response <- httr::GET(full_url, httr::progress())
  jsonlite::fromJSON(httr::content(
    response,
    as = "text",
    encoding = "UTF-8"
  ))$Data
}

#' Query DHS API Directly via URL Parameters
#'
#' builds and queries DHS API for indicator data using URL-based access
#' instead of rdhs package.
#'
#' @param countryIds Comma-separated DHS country code(s), e.g., "SL"
#' @param indicatorIds Comma-separated DHS indicator ID(s),
#'   e.g., "CM_ECMR_C_U5M"
#' @param surveyIds optional comma-separated survey ID(s), e.g.,
#' "SL2016DHS"
#' @param surveyYear optional exact survey year, e.g., "2016"
#' @param surveyYearStart optional survey year range start
#' @param surveyYearEnd optional survey year range end
#' @param breakdown one of: "national", "subnational", "background",
#' "all"
#' @param f format to return (default is "json")
#'
#' @return a data.frame containing the `Data` portion of the API
#' response.
#' @export
download_dhs_indicators <- function(
  countryIds,
  indicatorIds,
  surveyIds = NULL,
  surveyYear = NULL,
  surveyYearStart = NULL,
  surveyYearEnd = NULL,
  breakdown = "subnational",
  f = "json"
) {
  # base URL
  base_url <-
    "https://api.dhsprogram.com/rest/dhs/data?"

  # assemble query string
  query <- paste0(
    "breakdown=",
    breakdown,
    "&indicatorIds=",
    indicatorIds,
    "&countryIds=",
    countryIds,
    if (!is.null(surveyIds))
      paste0("&surveyIds=", surveyIds),
    if (!is.null(surveyYear))
      paste0("&surveyYear=", surveyYear),
    if (!is.null(surveyYearStart))
      paste0("&surveyYearStart=", surveyYearStart),
    if (!is.null(surveyYearEnd))
      paste0("&surveyYearEnd=", surveyYearEnd),
    "&lang=en&f=",
    f
  )

  full_url <- paste0(base_url, query)

  cli::cli_alert_info("Downloading DHS data...")

  response <- httr::GET(full_url, httr::progress())

  if (httr::http_error(response)) {
    stop("API request failed: ", httr::status_code(response))
  }

  content_raw <- httr::content(
    response, as = "text", encoding = "UTF-8"
  )
  data <- jsonlite::fromJSON(content_raw)$Data

  cli::cli_alert_success(
    "Download complete: {nrow(data)} records retrieved."
  )
  return(data)
}

#### Step 1.1.2: Download the relevant indicators ------------------------------

# RDT-based prevalence
rdt_mean <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_RDT",
  breakdown = "subnational"
) |>
  dplyr::mutate(
    dhs_indicator_id = "ML_PMAL_C_RDT",
    indicator = "Malaria prevalence (RDT)"
  ) |>
  dplyr::select(
    dhs_indicator_id,
    indicator,
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    mean_prev = Value
  )

rdt_low <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_RDL",
  breakdown = "subnational"
) |>
  dplyr::select(
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    low_prev = Value
  )

rdt_upp <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_RDU",
  breakdown = "subnational"
) |>
  dplyr::select(
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    upp_prev = Value
  )

# microscopy-based prevalence
mic_mean <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_MSY",
  breakdown = "subnational"
) |>
  dplyr::mutate(
    dhs_indicator_id = "ML_PMAL_C_MSY",
    indicator = "Malaria prevalence (Microscopy)"
  ) |>
  dplyr::select(
    dhs_indicator_id,
    indicator,
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    mean_prev = Value
  )

mic_low <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_MSL",
  breakdown = "subnational"
) |>
  dplyr::select(
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    low_prev = Value
  )

mic_upp <- download_dhs_indicators(
  countryIds = "SL",
  surveyYear = 2016,
  indicatorIds = "ML_PMAL_C_MSU",
  breakdown = "subnational"
) |>
  dplyr::select(
    region_label = CharacteristicLabel,
    adm2_code = RegionId,
    upp_prev = Value
  )

#### Step 1.1.3a: Join prevalence indicators when a dedicated shapefile IS available

# get the DHS adm2 shapefile
sle_dhs_shp2 <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries",
    "1ai_adm2",
    "sdr_mis_subnational_boundaries_adm2.shp"
  )
) |>
  dplyr::select(
    adm1 = OTHREGNA,
    adm2 = DHSREGEN,
    adm2_code = REG_ID
  ) |>
  # make adm1 and adm2 to upper case
  dplyr::mutate(
    adm1 = toupper(adm1),
    adm2 = toupper(adm2)
  )

# join pfpr both indicators
pfpr_indicators <- dplyr::bind_rows(
  pfpr_indicator_rdt,
  pfpr_indicator_mic
)

# join adm2 dhs shapefile to data only keeping adm2 indicators
pfpr_indicators_final <- pfpr_indicators |>
  dplyr::inner_join(sle_dhs_shp2, by = "adm2_code") |>
  # select only relevant columns
  dplyr::select(
    dhs_indicator_id,
    indicator,
    adm1,
    adm2,
    mean_prev,
    low_prev,
    upp_prev,
    mean_prev,
    low_prev,
    upp_prev,
  ) |>
  sf::st_as_sf()

# check indicators
sf::st_drop_geometry(pfpr_indicators_final)

#### Step 1.1.3b: Join prevalence indicators when a dedicated shapefile is NOT available

# load the DHS `admin-2` shapefile
sle_dhs_shp2 <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries",
    "1ai_adm2",
    "sdr_subnational_boundaries_adm2.shp"
  )
) |>
  dplyr::select(
    adm1 = OTHREGNA,
    adm2 = DHSREGEN,
    adm2_code = REG_ID
  ) |>
  # make adm1 and adm2 to upper case
  dplyr::mutate(
    adm1 = toupper(adm1),
    adm2 = toupper(adm2)
  )

# join rdt indicator with CIs
pfpr_indicator_rdt <- rdt_mean |>
  dplyr::left_join(rdt_low, by = "region_label") |>
  dplyr::left_join(rdt_upp, by = "region_label")

# join mic indicator with CIs
pfpr_indicator_mic <- mic_mean |>
  dplyr::left_join(mic_low, by = "region_label") |>
  dplyr::left_join(mic_upp, by = "region_label")

# combine RDT and microscopy indicators with CIs
pfpr_indicators <- dplyr::bind_rows(
  pfpr_indicator_rdt,
  pfpr_indicator_mic
) |>
  # manually clean up admin names
  dplyr::mutate(
    region_label_updated = dplyr::case_when(
      region_label == "..Kailahun" ~ "KAILAHUN",
      region_label == "..Kenema" ~ "KENEMA",
      region_label == "..Kono" ~ "KONO",
      region_label == "..Koinadugu (before 2017)" ~ "KOINADUGU",
      region_label == "..Tonkolili" ~ "TONKOLILI",
      region_label == "..Kambia" ~ "KAMBIA",
      region_label == "..Karene" ~ "KARENE",
      region_label == "..Bombali (before 2017)" ~ "BOMBALI",
      region_label == "..Port Loko (before 2017)" ~ "PORT LOKO",
      region_label == "..Bo" ~ "BO",
      region_label == "..Bonthe" ~ "BONTHE",
      region_label == "..Moyamba" ~ "MOYAMBA",
      region_label == "..Pujehun" ~ "PUJEHUN",
      region_label == "..Western Rural" ~ "WESTERN AREA RURAL",
      region_label == "..Western Urban" ~ "WESTERN AREA URBAN",
      region_label == "Eastern" ~ "EASTERN",
      region_label == "Northern (before 2017)" ~ "NORTHERN",
      region_label == "Western" ~ "WESTERN",
      region_label == "Southern" ~ "SOUTHERN",
      TRUE ~ NA
    )
  ) |>
  dplyr::filter(!is.na(region_label_updated))

# join shapefile to indicator data
pfpr_indicators_final <-
  pfpr_indicators |>
  dplyr::inner_join(sle_dhs_shp2, by = c("region_label_updated" = "adm2")) |>
  dplyr::select(
    dhs_indicator_id,
    indicator,
    adm1,
    adm2 = region_label_updated,
    mean_prev,
    low_prev,
    upp_prev,
    geometry
  ) |>
  sf::st_as_sf()

# preview the joined data (without geometry)
sf::st_drop_geometry(pfpr_indicators_final)

### Step 1: Option 2 (Calculate Prevalence from Raw DHS Data) ------------------

#### Step 1.2.1: Load the relevant DHS data ------------------------------------

# import the KR (children's) data
sle_dhs_pr <- readRDS(
  here::here("1.6_health_systems/1.6a_dhs/raw/SLPR73FL.rds")
) |>
  # make location column
  dplyr::mutate(
    adm1 = haven::as_factor(hv024) |>
      toupper(),
    adm2 = haven::as_factor(shdist) |>
      toupper()
  )

#### Step 1.2.2: Extract individual records ------------------------------------

# extract pfpr from individual records
sle_dhs_pr2 <- sle_dhs_pr |>
  dplyr::mutate(
    # tested for malaria via RDT
    tested_rdt = dplyr::case_when(
      # child was present last night (hv103 == 1),
      # has a mother in the household (hv042 == 1),
      # is between 6–59 months (hc1),
      # and has a valid RDT result (0 = negative, 1 = positive)
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml35 %in% c(0, 1) ~ 1,
      # child meets criteria but no valid RDT result recorded
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 ~ 0,
      TRUE ~ NA_real_ # all others set to missing
    ),

    # microscopy testing status
    tested_mic = dplyr::case_when(
      # child was present last night (hv103 == 1),
      # has a mother in the household (hv042 == 1),
      # is aged 6–59 months (hc1),
      # and has a microscopy result: 0 = negative,
      # 1 = positive, 6 = invalid (still considered tested)
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml32 %in% c(0, 1, 6) ~
        1,
      # child is eligible but no valid microscopy result available
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 ~ 0,
      TRUE ~ NA_real_ # all others set to missing
    ),

    # RDT positive
    rdt_pos = dplyr::case_when(
      # positive RDT result (hml35 == 1) for eligible child
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml35 == 1 ~ 1,
      # negative RDT result (hml35 == 0) for eligible child
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml35 == 0 ~ 0,
      TRUE ~ NA_real_ # all others set to missing
    ),

    # microscopy positive
    mic_pos = dplyr::case_when(
      # positive microscopy result (hml32 == 1) for eligible child
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml32 == 1 ~ 1,
      # negative or invalid microscopy result (0 or 6)
      hv103 == 1 & hv042 == 1 & hc1 >= 6 & hc1 <= 59 &
        hml32 %in% c(0, 6) ~ 0,
      TRUE ~ NA_real_ # all others set to missing
    ),
    survey_weight = hv005 / 1000000
  ) |>
  dplyr::select(
    cluster_id = hv021,
    stratum_id = hv022,
    survey_weight,
    adm1,
    adm2,
    tested_rdt,
    tested_mic,
    rdt_pos,
    mic_pos
  )

#### Step 1.2.3: Calculate malaria prevalence (RDT and Microscopy) at district level

# RDT design
des_rdt <- survey::svydesign(
  ids = ~cluster_id,
  strata = ~stratum_id,
  weights = ~survey_weight,
  data = dplyr::filter(sle_dhs_pr2, tested_rdt == 1),
  nest = TRUE
)

# microscopy design
des_mic <- survey::svydesign(
  ids = ~cluster_id,
  strata = ~stratum_id,
  weights = ~survey_weight,
  data = dplyr::filter(sle_dhs_pr2, tested_mic == 1),
  nest = TRUE
)

# RDT prevalence by adm2
prev_rdt <- survey::svyby(
  ~rdt_pos,
  ~ adm1 + adm2,
  design = des_rdt,
  FUN = survey::svymean,
  vartype = c("ci"), # add CI
  keep.names = F
) |>
  dplyr::mutate(
    pct = round(rdt_pos * 100, 1),
    ci_low = round(ci_l * 100, 1),
    ci_upp = round(ci_u * 100, 1)
  ) |>
  dplyr::arrange(adm1) |>
  dplyr::mutate(indicator = "Malaria prevalence (RDT)") |>
  dplyr::select(adm2, indicator, pct, ci_low, ci_upp)

# microscopy prevalence by adm2
prev_mic <- survey::svyby(
  ~mic_pos,
  ~ adm1 + adm2,
  design = des_mic,
  FUN = survey::svymean,
  vartype = c("ci"), # add CI
  keep.names = F
) |>
  dplyr::mutate(
    pct = round(mic_pos * 100, 1),
    ci_low = round(ci_l * 100, 1),
    ci_upp = round(ci_u * 100, 1)
  ) |>
  dplyr::arrange(adm1) |>
  dplyr::mutate(indicator = "Malaria prevalence (Microscopy)") |>
  dplyr::select(adm2, indicator, pct, ci_low, ci_upp)

# bind both together
prev_final <- dplyr::bind_rows(prev_rdt, prev_mic)

#### Step 1.2.4: Join prevalence indicators to a shapefile ---------------------

# load the DHS `admin-2` shapefile
sle_dhs_shp2 <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries",
    "1ai_adm2",
    "sdr_subnational_boundaries_adm2.shp"
  )
) |>
  dplyr::select(
    adm1 = OTHREGNA,
    adm2 = DHSREGEN,
    adm2_code = REG_ID
  ) |>
  # make adm1 and adm2 to upper case
  dplyr::mutate(
    adm1 = toupper(adm1),
    adm2 = toupper(adm2)
  )

# bind both together
prev_final <- prev_final |>
  # clean up adm2 names to match shapefile
  dplyr::mutate(
    adm2 = dplyr::case_when(
      adm2 == "WEST AREA RURAL" ~ "WESTERN AREA RURAL",
      adm2 == "WEST AREA URBAN" ~ "WESTERN AREA URBAN",
      TRUE ~ adm2
    )
  ) |>
  # join to shapefile
  dplyr::left_join(sle_dhs_shp2, by = c("adm2")) |>
  sf::st_as_sf()

sf::st_drop_geometry(prev_final)

### Step 2: Visualize Subnational Indicator Data -------------------------------

# categorize prevalence percentages
prev_final <- prev_final |>
  dplyr::mutate(
    prev_cat = cut(
      pct,
      breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
      labels = c(
        "<10",
        "10 to <20",
        "20 to <30",
        "30 to <40",
        "40 to <50",
        "50 to <60",
        "60 to <70",
        "70 to <80",
        "80 to <90",
        "90 to 100"
      ),
      include.lowest = TRUE,
      right = FALSE
    )
  )

# visualize RDT-based prevalence
map_rdt <- prev_final |>
  dplyr::filter(indicator == "Malaria prevalence (RDT)") |>
  ggplot2::ggplot() +
  ggplot2::geom_sf(
    ggplot2::aes(fill = prev_cat),
    color = "black",
    size = 2,
    show.legend = TRUE
  ) +
  ggplot2::scale_fill_manual(
    name = "RDT Prevalence (%)",
    values = c(
      "<10" = "#9E0142",
      "10 to <20" = "#D53E4F",
      "20 to <30" = "#F46D43",
      "30 to <40" = "#FDAE61",
      "40 to <50" = "#FEE08B",
      "50 to <60" = "#E6F598",
      "60 to <70" = "#ABDDA4",
      "70 to <80" = "#66C2A5",
      "80 to <90" = "#3288BD",
      "90 to 100" = "#5E4FA2"
    ),
    drop = FALSE
  ) +
  ggplot2::labs(
    title = "Malaria Prevalence by RDT (Admin-1)",
    caption = "Source: 2016 Sierra Leone MIS"
  ) +
  ggplot2::theme_void()

# microscopy map
map_mic <- prev_final |>
  dplyr::filter(indicator != "Malaria prevalence (RDT)") |>
  ggplot2::ggplot() +
  ggplot2::geom_sf(
    ggplot2::aes(fill = prev_cat),
    color = "black",
    size = 2,
    show.legend = TRUE
  ) +
  ggplot2::scale_fill_manual(
    name = "Microscopy Prevalence (%)",
    values = c(
      "<10" = "#9E0142",
      "10 to <20" = "#D53E4F",
      "20 to <30" = "#F46D43",
      "30 to <40" = "#FDAE61",
      "40 to <50" = "#FEE08B",
      "50 to <60" = "#E6F598",
      "60 to <70" = "#ABDDA4",
      "70 to <80" = "#66C2A5",
      "80 to <90" = "#3288BD",
      "90 to 100" = "#5E4FA2"
    ),
    drop = FALSE
  ) +
  ggplot2::labs(
    title = "Malaria Prevalence by Microscopy (Admin-1)",
    caption = "Source: 2016 Sierra Leone MIS"
  ) +
  ggplot2::theme_void()

# display maps side-by-side with shared legend
cowplot::plot_grid(
  map_rdt + ggplot2::theme(legend.position = "none"),
  map_mic + ggplot2::theme(legend.position = "none"),
  cowplot::get_legend(map_rdt),
  ncol = 3,
  rel_widths = c(1, 1, 0.3)
)

# save RDT map
ggplot2::ggsave(
  plot = map_rdt,
  filename = here::here(
    "03_output/3a_figures/malaria_prev_rdt_adm2.png"
  ),
  width = 12,
  height = 9,
  dpi = 300
)

# save microscopy map
ggplot2::ggsave(
  plot = map_mic,
  filename = here::here(
    "03_output/3a_figures/malaria_prev_mic_adm2.png"
  ),
  width = 12,
  height = 9,
  dpi = 300
)

### Step 3: Save Processed Malaria Prevalence Indicators -----------------------

# define save directory
save_path <- here::here("1.6_health_systems/1.6a_dhs")

# save final joined malaria prevalence data
rio::export(
  prev_final |> sf::st_drop_geometry(),
  here::here(
    save_path, "processed", "final_malaria_prev_sle_adm2.csv"
  )
)

# save final joined malaria prevalence data
rio::export(
  prev_final,
  here::here(
    save_path, "processed", "final_malaria_prev_sle_adm2.rds"
  )
)