Parasite prevalence, the proportion of a population that is infected with malaria parasites, is a measure of malaria morbidity. In SNT, parasite prevalence can be used to understand historical trends, for epidemiological stratification, and to identify eligible and priority areas for interventions.
Prevalence is measured in cross-sectional national household surveys such as the Malaria Indicator Surveys (MIS), and may also be measured in Demographic and Health Surveys (DHS) or Multiple Indicator Cluster Surveys (MICS). These surveys measure prevalence in children, usually those under 5 years of age but sometimes also those 5-9 years, by testing the sampled children for malaria infection via microscopy or rapid diagnostic test (RDT).
This page is focused on extracting prevalence of malaria infection in children as reported in MIS/DHS surveys.
Objectives
Access household survey data on prevalence of malaria infection at the adm1 level
Inspect, visualize, and validate prevalence indicators
Compile and save clean, subnational summaries of prevalence indicators for integration into SNT workflows
Data Needs
Core Indicators for Prevalence
Malaria prevalence can be measured using various diagnostic methods, of which up to two are used in household surveys: Rapid Diagnostic Tests (RDT) and microscopy. These methods are applied to children sampled in the household survey, typically those under five years of age. Each diagnostic method yields a slightly different estimate of prevalence due to differences in what they detect, how long those signals remain detectable in the bloodstream, the sensitivity and specificity of the test itself, and the ability of the human reader to interpret the test result accurately.
Prevalence by RDT refers to the proportion of children who tested positive for malaria using a rapid diagnostic test. These tests detect malaria antigens (such as HRP2/3 or LDH), which can remain in the bloodstream for several weeks after treatment. To calculate RDT-based prevalence in DHS/MIS data, the numerator is the number of children with a positive RDT result (hml35 == 1), and the denominator is the number of children with a valid RDT result (hml35 %in% c(0,1)).
Prevalence by microscopy refers to the proportion of children who tested positive for malaria parasites through blood smear microscopy. This method is more specific to current infection, as it detects actual parasites in the blood, but requires a trained microscopist to search for and positively identify parasites in the sample. To calculate microscopy-based prevalence in DHS/MIS data, the numerator is the number of children with a positive microscopy result (hml32 == 1), and the denominator is the number of children with a valid microscopy result (hml32 %in% c(0,1)).
Interpreting and Using Survey Data on Prevalence
It is normal to obtain slightly different prevalence values when considering the microscopy test result vs RDT. Since the antigens detected by RDTs can remain in the bloodstream for several weeks after an infection is cleared by effective antimalarials, or naturally cleared, generally prevalence measured by RDT is higher than prevalence measured by microscopy.
If both prevalence indicators are available, each should be summarized at the subnational level and reviewed side by side. Differences between them should be interpreted in light of what each test measures, as well as any contextual knowledge on the quality of implementation, such as the availability of trained microscopists. Generally the SNT team will have insight into whether there were particular difficulties encountered during the survey, either with sampling, laboratory testing, and/or data management, that should influence the interpretation of survey results. The SNT team should be presented with both indicators and decide which version is most suitable for decision-making.
Step-by-Step
Several options on how to access and use DHS data are presented on the DHS Data Overview and Preparation page. Please refer there for more details.
Here, we will apply Option 1 (pull standard indicators directly from DHS) and Option 2 (build from raw data via API). While Option 1 is the quickest approach, it has limitations—only predefined indicators are available, with no control over aggregation or customization. To address this, we also offer Option 2, which provides full flexibility by using the rdhs package to access microdata and construct indicators directly.
To skip the step-by-step explanation, jump to the full code at the end of this page.
This approach uses pre-calculated malaria prevalence indicators from the DHS API at the subnational level. It provides a fast way to retrieve prevalence estimates based on either RDT or microscopy, disaggregated by region or residence. However, users cannot adjust the age bands or diagnostic method beyond what DHS has published.
Step 1.1.1: Install or load required packages
In this step, we download subnational indicator data using the specified DHS Indicator IDs. We then join these indicators with our shapefile of interest (either admin-1 or admin-2), which determines the geographic level of analysis. This merged dataset forms the basis for plotting results and conducting subnational analysis.
We start by loading the required packages and setting up for the full section. For Option 1, we use two custom functions: check_dhs_indicators() and download_dhs_indicators().
# install or load required packagespacman::p_load( here, # for handling relative file paths haven, # for reading DHS .dta files and labelled data dplyr, # for data wrangling stringr, # for filtering U5MR rows using regex tibble, # for rownames_to_column ggplot2, # for visualizing U5MR maps sf, # for working with shapefiles rio, # for saving outputs in .csv and .rds formats DHS.rates # for calculating under-five mortality rates)#' Check DHS Indicator List from API#'#' @param countryIds DHS country code(s), e.g., "EG"#' @param indicatorIds specific indicator ID(s)#' @param surveyIds survey ID(s)#' @param surveyYear exact year#' @param surveyYearStart start of year range#' @param surveyYearEnd end of year range#' @param surveyType DHS survey type (e.g., "DHS", "MIS")#' @param surveyCharacteristicIds filter by survey characteristic ID#' @param tagIds filter by tag ID#' @param returnFields Fields to return#' (default: IndicatorId, Label, Definition)#' @param perPage max results per page (default = 500)#' @param page specific page to return (default = 1)#' @param f format (default = "json")#'#' @return a data.frame of indicators#' @exportcheck_dhs_indicators <-function(countryIds =NULL,indicatorIds =NULL,surveyIds =NULL,surveyYear =NULL,surveyYearStart =NULL,surveyYearEnd =NULL,surveyType =NULL,surveyCharacteristicIds =NULL,tagIds =NULL,returnFields =c("IndicatorId", "Label", "Definition", "MeasurementType" ),perPage =NULL,page =NULL,f ="json") {# base URL base_url <-"https://api.dhsprogram.com/rest/dhs/indicators?"# build query string params <-list(countryIds = countryIds,indicatorIds = indicatorIds,surveyIds = surveyIds,surveyYear = surveyYear,surveyYearStart = surveyYearStart,surveyYearEnd = surveyYearEnd,surveyType = surveyType,surveyCharacteristicIds = surveyCharacteristicIds,tagIds = tagIds,returnFields =paste(returnFields, collapse =","),perPage = perPage,page = page,f = f )# drop NULLs and encode query <-paste( purrr::compact(params) |> purrr::imap_chr(~paste0( .y, "=", URLencode(as.character(.x), reserved =TRUE) ) ),collapse ="&" )# full URL full_url <-paste0(base_url, query)# fetch with progress bar response <- httr::GET(full_url, httr::progress()) jsonlite::fromJSON(httr::content( response,as ="text",encoding ="UTF-8" ))$Data}#' Query DHS API Directly via URL Parameters#'#' builds and queries DHS API for indicator data using URL-based access#' instead of rdhs package.#'#' @param countryIds Comma-separated DHS country code(s), e.g., "SL"#' @param indicatorIds Comma-separated DHS indicator ID(s),#' e.g., "CM_ECMR_C_U5M"#' @param surveyIds optional comma-separated survey ID(s), e.g.,#' "SL2016DHS"#' @param surveyYear optional exact survey year, e.g., "2016"#' @param surveyYearStart optional survey year range start#' @param surveyYearEnd optional survey year range end#' @param breakdown one of: "national", "subnational", "background",#' "all"#' @param f format to return (default is "json")#'#' @return a data.frame containing the `Data` portion of the API#' response.#' @exportdownload_dhs_indicators <-function( countryIds, indicatorIds,surveyIds =NULL,surveyYear =NULL,surveyYearStart =NULL,surveyYearEnd =NULL,breakdown ="subnational",f ="json") {# base URL base_url <-"https://api.dhsprogram.com/rest/dhs/data?"# assemble query string query <-paste0("breakdown=", breakdown,"&indicatorIds=", indicatorIds,"&countryIds=", countryIds,if (!is.null(surveyIds))paste0("&surveyIds=", surveyIds),if (!is.null(surveyYear))paste0("&surveyYear=", surveyYear),if (!is.null(surveyYearStart))paste0("&surveyYearStart=", surveyYearStart),if (!is.null(surveyYearEnd))paste0("&surveyYearEnd=", surveyYearEnd),"&lang=en&f=", f ) full_url <-paste0(base_url, query) cli::cli_alert_info("Downloading DHS data...") response <- httr::GET(full_url, httr::progress())if (httr::http_error(response)) {stop("API request failed: ", httr::status_code(response)) } content_raw <- httr::content( response, as ="text", encoding ="UTF-8" ) data <- jsonlite::fromJSON(content_raw)$Data cli::cli_alert_success("Download complete: {nrow(data)} records retrieved." )return(data)}
Step 1.1.2: Download the relevant indicators
As shown in Step 1.2 of Option 1 in the DHS Data Overview and Preparation page, we use the check_dhs_indicators() function to explore available indicators and ensure that we are pulling the correct DHS indicator IDs for the 2016 Sierra Leone MIS. In this section, we focus on parasite prevalence among children and its associated confidence intervals, using both RDT and microscopy-based estimates.
The relevant indicator IDs are:
ML_PMAL_C_RDT: Malaria prevalence according to RDT
ML_PMAL_C_RDL: Malaria prevalence according to RDT—CI lower bound (-2SE)
ML_PMAL_C_RDU: Malaria prevalence according to RDT—CI upper bound (+2SE)
ML_PMAL_C_MSY: Malaria prevalence according to microscopy
ML_PMAL_C_MSL: Malaria prevalence according to microscopy—CI lower bound (-2SE)
ML_PMAL_C_MSU: Malaria prevalence according to microscopy—CI upper bound (+2SE)
Lines 3, 21, 33, 46, 64, 76: Replace "SL" with your country’s DHS code (e.g., "NG" for Nigeria, "KE" for Kenya)
Lines 4, 22, 34, 47, 65, 77: Update the surveyYear if you’re using a different round (e.g., 2019)
Lines 5, 23, 35, 48, 66, 78: Ensure the correct Indicator ID is used as listed above
Lines 6, 24, 36, 49, 67, 79: Keep breakdown = "subnational" to get regional estimates (typically admin-1); use "national" if only national values are needed
Step 1.1.3a: Join prevalence indicators when a dedicated shapefile IS available
Now that the data is ready, we need to join it to a shapefile so we can visualize it on a map. In cases where a dedicated shapefile is available that aligns with the administrative boundaries used in the survey, the join process is simple. You can use the region code (e.g., adm2_code) directly, avoiding manual name harmonization done in Step 1.1.3b. If there is no survey-associated shapefile or you would like to use a specific non-survey shapefile, skip this Step and proceed directly to *Step 1.1.3b.
In reality, the 2016 Sierra Leone MIS does not have a matching shapefile (hence the approach in Step 1.1.3b), so the code shown here is simply to demonstrate how the process would work when region codes in the indicator data match those in the shapefile. See the ITN Metrics and Mortality Data pages for real cases where region code joins were used successfully.
# get the DHS adm2 shapefilesle_dhs_shp2 <- sf::read_sf( here::here("01_foundational/1a_administrative_boundaries","1ai_adm2","sdr_mis_subnational_boundaries_adm2.shp" )) |> dplyr::select(adm1 = OTHREGNA,adm2 = DHSREGEN,adm2_code = REG_ID ) |># make adm1 and adm2 to upper case dplyr::mutate(adm1 =toupper(adm1),adm2 =toupper(adm2) )# join pfpr both indicatorspfpr_indicators <- dplyr::bind_rows( pfpr_indicator_rdt, pfpr_indicator_mic)# join adm2 dhs shapefile to data only keeping adm2 indicatorspfpr_indicators_final <- pfpr_indicators |> dplyr::inner_join(sle_dhs_shp2, by ="adm2_code") |># select only relevant columns dplyr::select( dhs_indicator_id, indicator, adm1, adm2, mean_prev, low_prev, upp_prev, mean_prev, low_prev, upp_prev, ) |> sf::st_as_sf()# check indicatorssf::st_drop_geometry(pfpr_indicators_final)
To adapt the code:
Shapefile loading (Lines 3–10): If working at the admin-1 level, load the corresponding adm1 shapefile instead (e.g., 1ai_adm1). In that case, do not rename or select adm2 columns—keep only adm1 and the appropriate join code
Lines 11–18: When using admin-1, you can skip any code referencing adm2 variables (e.g., adm2 = DHSREGEN or mutate(adm2 = ...)). Focus on cleaning and renaming just the admin-1 fields
Line 21: RDT and microscopy data contain results at both admin-1 and admin-2 levels. You do not need to filter them beforehand. Instead, your choice of shapefile and join key (e.g., adm2_code) will implicitly filter the indicator data to match only the relevant administrative level
Line 28: Use inner_join() to retain only rows that match your shapefile’s geographic level. For admin-1, this would be by = "adm1_code"; for admin-2, use by = "adm2_code" as shown
Step 1.1.3b: Join prevalence indicators when a dedicated shapefile is NOT available
For the 2016 Sierra Leone MIS, there is no dedicated shapefile available that corresponds directly to the administrative boundaries used in that survey. As a result, we will use the Sierra Leone 2019 DHS administrative boundaries from the DHS Spatial Data Repository as a reference and manually harmonize the region names provided in the indicator data to match the shapefile.
Note
If a shapefile exists for the specific DHS/MIS country-year (e.g., a matching GE file or custom shapefile from the survey implementer), you can use the code from Step 1.1.3a instead, which shows how to join using region codes instead of harmonized names. However, if you are working with a custom shapefile approved by the SNT team, you may need to do name harmonization as shown in Step 1.1.3b.
Please revisit Step 1.4 of DHS Data Overview and Preparation for more guidance on using custom shapefiles when DHS/MIS boundaries are unavailable or not the preferred option.
We use the region_label_updated column as our cleaned and harmonized geographic name field. This column serves as the join key when merging the prevalence indicators with the shapefile. If we are working at the admin-2 level, we join region_label_updated to the adm2 column in the shapefile. If instead we are working at the admin-1 level, we would harmonize region_label_updated to match adm1 and join using region_label_updated = adm1. Because we use an inner join, only rows that have a match in the shapefile are retained. This means that, for example, admin-1 summary rows in the indicator data will be excluded if we are performing an admin-2 level join. This ensures the resulting dataset contains only relevant, mappable units.
# load the DHS `admin-2` shapefilesle_dhs_shp2 <- sf::read_sf( here::here("01_foundational/1a_administrative_boundaries","1ai_adm2","sdr_subnational_boundaries_adm2.shp" )) |> dplyr::select(adm1 = OTHREGNA,adm2 = DHSREGEN,adm2_code = REG_ID ) |># make adm1 and adm2 to upper case dplyr::mutate(adm1 =toupper(adm1),adm2 =toupper(adm2) )# join rdt indicator with CIspfpr_indicator_rdt <- rdt_mean |> dplyr::left_join(rdt_low, by ="region_label") |> dplyr::left_join(rdt_upp, by ="region_label")# join mic indicator with CIspfpr_indicator_mic <- mic_mean |> dplyr::left_join(mic_low, by ="region_label") |> dplyr::left_join(mic_upp, by ="region_label")# combine RDT and microscopy indicators with CIspfpr_indicators <- dplyr::bind_rows( pfpr_indicator_rdt, pfpr_indicator_mic) |># manually clean up admin names dplyr::mutate(region_label_updated = dplyr::case_when( region_label =="..Kailahun"~"KAILAHUN", region_label =="..Kenema"~"KENEMA", region_label =="..Kono"~"KONO", region_label =="..Koinadugu (before 2017)"~"KOINADUGU", region_label =="..Tonkolili"~"TONKOLILI", region_label =="..Kambia"~"KAMBIA", region_label =="..Karene"~"KARENE", region_label =="..Bombali (before 2017)"~"BOMBALI", region_label =="..Port Loko (before 2017)"~"PORT LOKO", region_label =="..Bo"~"BO", region_label =="..Bonthe"~"BONTHE", region_label =="..Moyamba"~"MOYAMBA", region_label =="..Pujehun"~"PUJEHUN", region_label =="..Western Rural"~"WESTERN AREA RURAL", region_label =="..Western Urban"~"WESTERN AREA URBAN", region_label =="Eastern"~"EASTERN", region_label =="Northern (before 2017)"~"NORTHERN", region_label =="Western"~"WESTERN", region_label =="Southern"~"SOUTHERN",TRUE~NA ) ) |> dplyr::filter(!is.na(region_label_updated))# join shapefile to indicator datapfpr_indicators_final <- pfpr_indicators |> dplyr::inner_join(sle_dhs_shp2, by =c("region_label_updated"="adm2")) |> dplyr::select( dhs_indicator_id, indicator, adm1,adm2 = region_label_updated, mean_prev, low_prev, upp_prev, geometry ) |> sf::st_as_sf()# preview the joined data (without geometry)sf::st_drop_geometry(pfpr_indicators_final)
Output
dhs_indicator_id indicator adm1 adm2
1 ML_PMAL_C_RDT Malaria prevalence (RDT) EAST KAILAHUN
2 ML_PMAL_C_RDT Malaria prevalence (RDT) EAST KENEMA
3 ML_PMAL_C_RDT Malaria prevalence (RDT) EAST KONO
4 ML_PMAL_C_RDT Malaria prevalence (RDT) NORTH KOINADUGU
5 ML_PMAL_C_RDT Malaria prevalence (RDT) NORTH TONKOLILI
6 ML_PMAL_C_RDT Malaria prevalence (RDT) NORTH KAMBIA
7 ML_PMAL_C_RDT Malaria prevalence (RDT) NORTH BOMBALI
8 ML_PMAL_C_RDT Malaria prevalence (RDT) NORTH PORT LOKO
9 ML_PMAL_C_RDT Malaria prevalence (RDT) SOUTH BO
10 ML_PMAL_C_RDT Malaria prevalence (RDT) SOUTH BONTHE
11 ML_PMAL_C_RDT Malaria prevalence (RDT) SOUTH MOYAMBA
12 ML_PMAL_C_RDT Malaria prevalence (RDT) SOUTH PUJEHUN
13 ML_PMAL_C_RDT Malaria prevalence (RDT) WEST WESTERN AREA RURAL
14 ML_PMAL_C_RDT Malaria prevalence (RDT) WEST WESTERN AREA URBAN
15 ML_PMAL_C_MSY Malaria prevalence (Microscopy) EAST KAILAHUN
16 ML_PMAL_C_MSY Malaria prevalence (Microscopy) EAST KENEMA
17 ML_PMAL_C_MSY Malaria prevalence (Microscopy) EAST KONO
18 ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH KOINADUGU
19 ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH TONKOLILI
20 ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH KAMBIA
21 ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH BOMBALI
22 ML_PMAL_C_MSY Malaria prevalence (Microscopy) NORTH PORT LOKO
23 ML_PMAL_C_MSY Malaria prevalence (Microscopy) SOUTH BO
24 ML_PMAL_C_MSY Malaria prevalence (Microscopy) SOUTH BONTHE
25 ML_PMAL_C_MSY Malaria prevalence (Microscopy) SOUTH MOYAMBA
26 ML_PMAL_C_MSY Malaria prevalence (Microscopy) SOUTH PUJEHUN
27 ML_PMAL_C_MSY Malaria prevalence (Microscopy) WEST WESTERN AREA RURAL
28 ML_PMAL_C_MSY Malaria prevalence (Microscopy) WEST WESTERN AREA URBAN
mean_prev low_prev upp_prev
1 67.0 59.8 74.3
2 59.3 48.9 69.6
3 49.5 39.6 59.4
4 78.1 72.2 84.0
5 68.3 61.3 75.3
6 59.4 45.8 73.1
7 47.7 37.1 58.2
8 69.8 62.7 76.9
9 57.1 49.6 64.5
10 46.8 39.1 54.6
11 60.6 51.6 69.6
12 69.2 63.0 75.4
13 33.5 24.5 42.5
14 3.8 2.0 5.5
15 45.0 39.8 50.2
16 37.7 29.4 46.0
17 37.5 27.0 47.9
18 57.9 49.7 66.1
19 55.7 48.1 63.4
20 48.3 38.4 58.3
21 37.6 28.9 46.3
22 58.5 49.8 67.2
23 39.7 29.9 49.5
24 26.1 21.0 31.3
25 39.9 31.4 48.4
26 46.8 39.2 54.4
27 34.9 25.6 44.2
28 6.3 2.1 10.5
To adapt the code:
Shapefile loading (Lines 3–10): This example uses the admin-2 shapefile (1ai_adm2). If your indicators are reported at the admin-1 level instead, update the path to load 1ai_adm1 and modify the column selections to match admin-1 fields (e.g., adm1 = DHSREGNA, adm1_code = REG_ID), excluding any adm2 fields.
Lines 21–23, 26–28: RDT and microscopy indicators are joined with their respective confidence intervals using region_label as the key. Ensure these match the format used in your input data.
Lines 36–59: The region_label values from the MIS 2016 data are manually harmonized using case_when() to produce a cleaned field called region_label_updated. Only those regions with a match in the shapefile should be retained, so this step is essential to ensure join compatibility.
Line 65: The inner_join() ensures that only subnational units that exist in both the indicator dataset and shapefile are retained. Because we are using an admin-2 shapefile, any admin-1 summary rows (like "Eastern" or "Southern") will be dropped automatically during this step.
Lines 66–74: Adjust the select() statement to include only the columns relevant to your analysis. If you’re working with just RDT or just microscopy, you may omit fields related to the other. If working at a different administrative level, ensure the adm1 or adm2 assignments reflect your join key correctly.
General note: Always inspect the output of sf::st_drop_geometry() to confirm that the join worked as expected and that no key subnational units were lost in the process
Step 1: Option 2 (Calculate Prevalence from Raw DHS Data)
# import the KR (children's) datasle_dhs_pr <-readRDS( here::here("1.6_health_systems/1.6a_dhs/raw/SLPR7HFL.rds")) |># make location column dplyr::mutate(adm1 = haven::as_factor(v024) |>toupper(),adm2 = haven::as_factor(sdist) |>toupper() )
To adapt the code:
Line 3: Replace PR file path with your country’s DHS dataset (e.g., NGPR7AFL.rds for Nigeria)
Lines 6–8: Update the administrative variable mappings if your country uses different DHS variable codes for admin-1 (hv024) and admin-2 (shdist or sdist) regions
If using standard DHS naming, no other changes should be needed.
Step 1.2.2: Extract individual records
In this step, we identify individual-level records for children aged 6–59 months who were eligible for malaria testing during the survey. Using the PR file, we construct binary indicators for whether a child was tested for malaria via RDT or microscopy, as well as whether they tested positive. These indicators are derived using DHS standard variables and eligibility criteria: the child must have slept in the household the previous night (hv103), be aged 6–59 months (hc1), and have a valid test result (hml35 for RDT and hml32 for microscopy).
# extract pfpr from individual recordssle_dhs_pr2 <- sle_dhs_pr |> dplyr::mutate(# tested for malaria via RDTtested_rdt = dplyr::case_when(# child was present last night (hv103 == 1),# has a mother in the household (hv042 == 1),# is between 6–59 months (hc1),# and has a valid RDT result (0 = negative, 1 = positive) hv103 ==1& hv042 ==1& hc1 >=6& hc1 <=59& hml35 %in%c(0, 1) ~1,# child meets criteria but no valid RDT result recorded hv103 ==1& hv042 ==1& hc1 >=6& hc1 <=59~0,TRUE~NA_real_# all others set to missing ),# microscopy testing statustested_mic = dplyr::case_when(# child was present last night (hv103 == 1),# has a mother in the household (hv042 == 1),# is aged 6–59 months (hc1),# and has a microscopy result: 0 = negative,# 1 = positive, 6 = invalid (still considered tested) hv103 ==1& hv042 ==1& hc1 >=6& hc1 <=59& hml32 %in%c(0, 1, 6) ~1,# child is eligible but no valid microscopy result available hv103 ==1& hv042 ==1& hc1 >=6& hc1 <=59~0,TRUE~NA_real_# all others set to missing ),# RDT positiverdt_pos = dplyr::case_when(# positive RDT result (hml35 == 1) for eligible child hv103 ==1& hv042 ==1& hc1 >=6& hc1 <=59& hml35 ==1~1,# negative RDT result (hml35 == 0) for eligible child hv103 ==1& hv042 ==1& hc1 >=6& hc1 <=59& hml35 ==0~0,TRUE~NA_real_# all others set to missing ),# microscopy positivemic_pos = dplyr::case_when(# positive microscopy result (hml32 == 1) for eligible child hv103 ==1& hv042 ==1& hc1 >=6& hc1 <=59& hml32 ==1~1,# negative or invalid microscopy result (0 or 6) hv103 ==1& hv042 ==1& hc1 >=6& hc1 <=59& hml32 %in%c(0, 6) ~0,TRUE~NA_real_# all others set to missing ),survey_weight = hv005 /1000000 ) |> dplyr::select(cluster_id = hv021,stratum_id = hv022, survey_weight, adm1, adm2, tested_rdt, tested_mic, rdt_pos, mic_pos )
To adapt the code:
Modify only if relevant parameters were not created exactly as in earlier steps.
Step 1.2.3: Calculate malaria prevalence (RDT and Microscopy) at district level
In this step, we calculate district-level malaria prevalence using survey-weighted estimates, restricted to children who were actually tested. We create two separate survey design objects: one for children tested via RDT, and another for those tested via microscopy. For each design, we apply svyby() to estimate the proportion of children who tested positive (rdt_pos and mic_pos), grouped by adm1 and adm2. Confidence intervals are included using the ci option in svyby(). The final output is a table showing the weighted prevalence estimates and their corresponding lower and upper bounds by district.
Modify only if relevant parameters were not created exactly as in earlier steps.
Step 1.2.4: Join prevalence indicators to a shapefile
Now that we have a final indicator dataset, we can join it with the MIS shapefile using adm2 as keys. Before doing so, we ensure that admin names are harmonized across both datasets.
For example, in the MIS data, adm2 are labeled as WEST AREA RURAL and WEST AREA URBAN, whereas the shapefile uses WESTERN AREA RURAL and WESTERN AREA URBAN. To ensure a successful join, we clean and recode these names accordingly before merging. Your case may differ—always inspect and harmonize naming between datasets to avoid mismatches during joins
# load the DHS `admin-2` shapefilesle_dhs_shp2 <- sf::read_sf( here::here("01_foundational/1a_administrative_boundaries","1ai_adm2","sdr_subnational_boundaries_adm2.shp" )) |> dplyr::select(adm1 = OTHREGNA,adm2 = DHSREGEN,adm2_code = REG_ID ) |># make adm1 and adm2 to upper case dplyr::mutate(adm1 =toupper(adm1),adm2 =toupper(adm2) )# bind both togetherprev_final <- prev_final |># clean up adm2 names to match shapefile dplyr::mutate(adm2 = dplyr::case_when( adm2 =="WEST AREA RURAL"~"WESTERN AREA RURAL", adm2 =="WEST AREA URBAN"~"WESTERN AREA URBAN",TRUE~ adm2 ) ) |># join to shapefile dplyr::left_join(sle_dhs_shp2, by =c("adm2")) |> sf::st_as_sf()sf::st_drop_geometry(prev_final)
Output
adm2 indicator pct ci_low ci_upp adm1
1 KAILAHUN Malaria prevalence (RDT) 67.0 60.4 73.6 EAST
2 KENEMA Malaria prevalence (RDT) 59.3 49.9 68.6 EAST
3 KONO Malaria prevalence (RDT) 49.5 40.3 58.7 EAST
4 BOMBALI Malaria prevalence (RDT) 47.7 39.7 55.6 NORTH
5 KAMBIA Malaria prevalence (RDT) 59.4 46.4 72.5 NORTH
6 KOINADUGU Malaria prevalence (RDT) 78.1 72.4 83.9 NORTH
7 PORT LOKO Malaria prevalence (RDT) 69.8 63.4 76.2 NORTH
8 TONKOLILI Malaria prevalence (RDT) 68.3 63.0 73.6 NORTH
9 BO Malaria prevalence (RDT) 57.1 52.8 61.3 SOUTH
10 BONTHE Malaria prevalence (RDT) 46.8 39.3 54.4 SOUTH
11 MOYAMBA Malaria prevalence (RDT) 60.6 51.8 69.4 SOUTH
12 PUJEHUN Malaria prevalence (RDT) 69.2 63.2 75.1 SOUTH
13 WESTERN AREA RURAL Malaria prevalence (RDT) 33.5 25.1 41.9 WEST
14 WESTERN AREA URBAN Malaria prevalence (RDT) 3.8 2.2 5.4 WEST
15 KAILAHUN Malaria prevalence (Microscopy) 45.0 39.9 50.1 EAST
16 KENEMA Malaria prevalence (Microscopy) 37.7 30.6 44.8 EAST
17 KONO Malaria prevalence (Microscopy) 37.5 27.1 47.8 EAST
18 BOMBALI Malaria prevalence (Microscopy) 37.6 31.5 43.7 NORTH
19 KAMBIA Malaria prevalence (Microscopy) 48.3 38.7 58.0 NORTH
20 KOINADUGU Malaria prevalence (Microscopy) 57.9 49.9 65.8 NORTH
21 PORT LOKO Malaria prevalence (Microscopy) 58.5 51.0 66.1 NORTH
22 TONKOLILI Malaria prevalence (Microscopy) 55.7 50.1 61.4 NORTH
23 BO Malaria prevalence (Microscopy) 39.7 32.3 47.1 SOUTH
24 BONTHE Malaria prevalence (Microscopy) 26.1 21.1 31.2 SOUTH
25 MOYAMBA Malaria prevalence (Microscopy) 39.9 31.6 48.2 SOUTH
26 PUJEHUN Malaria prevalence (Microscopy) 46.8 39.5 54.1 SOUTH
27 WESTERN AREA RURAL Malaria prevalence (Microscopy) 34.9 26.3 43.4 WEST
28 WESTERN AREA URBAN Malaria prevalence (Microscopy) 6.3 2.4 10.2 WEST
To adapt the code:
Shapefile loading (Lines 3–10): This example uses the admin-2 shapefile (1ai_adm2). If your indicators are reported at the admin-1 level instead, update the path to load 1ai_adm1 and modify the column selections to match admin-1 fields (e.g., adm1 = DHSREGNA, adm1_code = REG_ID), excluding any adm2 fields
Clean up administrative names in your indicator dataset to match the shapefile exactly. Adjust the logic in case_when() based on your admin naming conventions
It is important to note that in this example, we merge using adm2 alone. However, in cases where admin-2 names are not unique (e.g., the same name appears under different admin-1 regions), it is safer to merge using both adm1 and adm2, assuming both fields have been cleaned and standardized. This helps ensure accurate joins and prevent unintended mismatches.
Step 2: Visualize Subnational Indicator Data
Once the malaria prevalence data (from both RDT and microscopy) has been downloaded and joined with a shapefile, we can create maps to explore geographic variation in parasite prevalence. In the example below, we categorize the prevalence values into 10% bins and visualize them using ggplot2.
# categorize prevalence percentagesprev_final <- prev_final |> dplyr::mutate(prev_cat =cut( pct,breaks =c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),labels =c("<10","10 to <20","20 to <30","30 to <40","40 to <50","50 to <60","60 to <70","70 to <80","80 to <90","90 to 100" ),include.lowest =TRUE,right =FALSE ) )# visualize RDT-based prevalencemap_rdt <- prev_final |> dplyr::filter(indicator =="Malaria prevalence (RDT)") |> ggplot2::ggplot() + ggplot2::geom_sf( ggplot2::aes(fill = prev_cat),color ="black",size =2,show.legend =TRUE ) + ggplot2::scale_fill_manual(name ="RDT Prevalence (%)",values =c("<10"="#9E0142","10 to <20"="#D53E4F","20 to <30"="#F46D43","30 to <40"="#FDAE61","40 to <50"="#FEE08B","50 to <60"="#E6F598","60 to <70"="#ABDDA4","70 to <80"="#66C2A5","80 to <90"="#3288BD","90 to 100"="#5E4FA2" ),drop =FALSE ) + ggplot2::labs(title ="Malaria Prevalence by RDT (Admin-1)",caption ="Source: 2016 Sierra Leone MIS" ) + ggplot2::theme_void()# microscopy mapmap_mic <- prev_final |> dplyr::filter(indicator !="Malaria prevalence (RDT)") |> ggplot2::ggplot() + ggplot2::geom_sf( ggplot2::aes(fill = prev_cat),color ="black",size =2,show.legend =TRUE ) + ggplot2::scale_fill_manual(name ="Microscopy Prevalence (%)",values =c("<10"="#9E0142","10 to <20"="#D53E4F","20 to <30"="#F46D43","30 to <40"="#FDAE61","40 to <50"="#FEE08B","50 to <60"="#E6F598","60 to <70"="#ABDDA4","70 to <80"="#66C2A5","80 to <90"="#3288BD","90 to 100"="#5E4FA2" ),drop =FALSE ) + ggplot2::labs(title ="Malaria Prevalence by Microscopy (Admin-1)",caption ="Source: 2016 Sierra Leone MIS" ) + ggplot2::theme_void()# display maps side-by-side with shared legendcowplot::plot_grid( map_rdt + ggplot2::theme(legend.position ="none"), map_mic + ggplot2::theme(legend.position ="none"), cowplot::get_legend(map_rdt),ncol =3,rel_widths =c(1, 1, 0.3))
Output
To adapt the code:
This plotting example uses variable names from Option 2. If you are using the code from Option 1: Line 2: replace prev_final with pfpr_indicators_final; Line 5: replace pct with mean_prev.
Lines 4–17 (Binning prev_cat): Adjust cut() breaks if your data isn’t in 0–100% format. Use quantiles or continuous scales (scale_fill_viridis_c()) if preferred.
Indicator column: This example uses a single mean_prev column for both RDT and microscopy. Update if your data uses separate columns.
Lines 26, 59 (Filtering): Modify filter(indicator == ...) as needed to match your indicator labels.
Lines 51–53, 83–85 (Titles and captions): Update labs() with the correct country, year, survey type, and admin level.
Shapefile level: This uses admin-2. If using admin-1, adjust your shapefile and join key accordingly.
Lines 90–95 (Legend): The shared legend uses cowplot. Skip this if mapping only one indicator.
Now, let us save these plots for future reference.
Lines 4, 15: Change the output path and filenames for your plots as needed.
Step 3: Save Processed Malaria Prevalence Indicators
Save the extracted prevalence data for future usage and reference. These outputs will be used in subsequent steps for stratification, visualization, and integration with other indicators.
# define save directorysave_path <- here::here("1.6_health_systems/1.6a_dhs")# save final joined malaria prevalence datario::export( prev_final |> sf::st_drop_geometry(), here::here( save_path, "processed", "final_malaria_prev_sle_adm2.csv" ))# save final joined malaria prevalence datario::export( prev_final, here::here( save_path, "processed", "final_malaria_prev_sle_adm2.rds" ))
To adapt the code:
Lines 2, 7, 13: Change the output path and filenames for your saved data as needed.
Summary
This section outlines two methods for estimating malaria prevalence among children aged 6–59 months. Option 1 applies existing DHS-coded variables that follow standard definitions for RDT and microscopy results, offering a quick and reliable way to generate prevalence estimates aligned with DHS reporting. Option 2 builds the same indicators manually from the raw data, allowing for clearer visibility into inclusion criteria and the flexibility to modify assumptions as needed. Using both approaches strengthens reproducibility, enables customization, and helps validate consistency across indicator definitions.
Full code
Find the full code script for extracting prevalence of malaria infection from DHS/MIS data below.