Case management is a foundational intervention to reduce malaria morbidity and prevent malaria deaths. In the context of SNT, understanding care-seeking behavior and assessing the quality of care are critical for identifying districts where improvements are needed and for interpreting routine surveillance data.
Care-seeking rates can be derived from data from national surveys such as the Demographic and Health Surveys (DHS), Malaria Indicator Surveys (MIS), and Multiple Indicator Cluster Surveys (MICS). This page describes how to estimate treatment-seeking rates for malaria from DHS or MIS national household surveys.
Objectives
Understand how care-seeking rates can be estimated from national household surveys
Extract a denominator of recent malaria cases in children from survey data
Estimate treatment-seeking rates from recent malaria cases in children
Estimate treatment-seeking rates in years without surveys
Visualize treatment-seeking rates by admin unit and year
Key information to take into account in the context of SNT
General information on DHS surveys and how to access data is available on the DHS Overview page. When estimating treatment-seeking rates for SNT, we should consider the following points:
Where do I find household survey data on treatment-seeking?
While some types of survey data can be directly accessed via API, the API does not provide information on in which sector care was sought. Instead, you will need to work with downloaded DHS/MIS data files. Access survey data using the instructions on the DHS Program website or ask the SNT team for survey data files. Instructions for downloading via the rdhs package are also available on the DHS Overview page here.
At What administrative level should we estimate treatment-seeking?
On this page, we estimate treatment-seeking at the admin-1 level, as DHS and MIS are representative at the national and admin-1 level. This results in all districts within the same admin-1 receiving the same estimate. Estimates can be generated for smaller administrative units using Bayesian geospatial modeling (e.g., INLA).
How is treatment-seeking measured in household surveys?
In DHS/MIS surveys, questions relating to children’s health are included in the questionnaire for women. Mothers are questioned about their children under the age of 5:
Has your child had a fever at any time in the last two weeks?
(If yes) Have you sought advice or treatment for fever from any source?
(If yes) Where did you seek advice or treatment?
The treatment-seeking rate in a sector (public sector, private sector, or no care-seeking) is then defined as the proportion of children with recent fever who sought treatment in that sector. It is assumed that the treatment rate for recent fever and treatment rate for malaria fever is the same.
In MIS and some DHS, children under 5 are tested for malaria infection with a rapid diagnostic test (RDT). Some SNT teams have chosen to only consider children who had both a recent fever and a positive RDT at the time of survey when calculating treatment rate. This is because RDT-detected antigens remain in the blood for several weeks after treatment with antimalarials, and an RDT-postive child at time of survey was likely to have been also RDT-positive when they were febrile, making their fever more likely to have been caused by malaria.
To identify RDT-postive recently-febrile children, the individual child must be cross-referenced between the children file, which contains the RDT information, and the women’s file, which contains the recent fever and treatment-seeking information. This code library page contains code for both options (considering and not considerig the RDT result) on denominators. Consult your SNT team on which option to use.
How are the public and private sectors defined?
DHS/MIS uses specific codes to indicate various types of public and private sector sources of care, both medical and non-medical. The specific source each variable indicates is largely country-specific (see a general guide on pages 92–93 of the standard DHS recode PDF here). Consult individual survey recode files for country-specific information and verify with your SNT team that you are including and excluding the correct treatment source types in your definitions of public and private sectors.
Consult SNT team
Confirm which sources of care to include for the public and private sectors, as definitions may vary by country. For example, should pharmacies or traditional healers be included?
How should we handle years without survey data?
For SNT, we may need annual estimates of treatment-seeking. Because DHS/MIS surveys are not conducted every year, this page includes the following options for estimating treatment-seeking rates in non-survey year:
Use data from a given survey year until a new survey becomes available, or apply the most recent survey data available
Interpolate data linearly between survey years
Step-by-Step
This section covers the estimation of the proportion of malaria fevers that seek treatment in the public sector, private sector, or do not seek treatment. It will guide you through importing the data, selecting and computing new variables,estimating the treatment-seeking rates while accounting for the survey sampling plan, and finally, visualizing and saving the results for future use.
To skip the step-by-step explanation, Jump to the full code at the end of this page.
Step 1: Set Up Packages and Functions
We will install and load the necessary packages for this section. Then, we will define helper functions needed for the analysis.
To estimate treatment-seeking rates, we use DHS or MIS datasets—primarily the PR and KR files. This step assumes that the relevant datasets are already available, either through prior download from the DHS website or provided by the SNT team.
🔗 Not yet prepared your DHS data?
If you haven’t yet accessed or prepared the DHS datasets (e.g., PR, KR), we recommend starting with the DHS Data Preparation page, which provides a step-by-step guide for downloading, organizing, and understanding DHS files used in SNT workflows.
Step 1.1: Install and Load Libraries
We now load the packages we’ll need for working with survey data, installing any that are missing.
# Install pacman only if it's not already installedif (!requireNamespace("pacman", quietly =TRUE)) {install.packages("pacman")}# install or load relevant packagespacman::p_load( rdhs, # for downloading dhs data tidyverse, # core tidy tools (includes dplyr, tidyr, readr, etc.) haven, # read and write various data formats (stata) httr, # download files from web (GET, write_disk) rio, # importing data here, # relative file pathways data.table,# is an excellent extension of the data.frame class fs, # handle file paths purrr, # iterate over lists (map, walk, etc.) rlang, # for tidy evaluation and error messaging survey, # analyzing data from complex surveys sf, # Support for simple feature access, a standardized way# to encode and analyze spatial vector data tmap, # drawing thematic maps knitr # Make table)
Step 1.2: Define Helper Functions
We start by defining helper functions that import data, create a complex survey design object, compute survey-weighted proportions, and estimate proportions.
The svydesign.fun function creates a complex survey design object from a DHS dataset by specifying the primary sampling unit (id), stratification (strate), and sampling weights (wt). This structure is required to ensure correct variance estimation when analyzing complex survey data with the survey package.
The result.prop function estimates the survey-weighted mean of a binary outcome (var), grouped by a second variable (var1). It returns means and 95% confidence intervals by group.
Finally, the estim_prop function wraps the entire workflow: it first creates a survey design object from the input dataset, and then computes survey-weighted proportions for a given indicator (col), stratified by specified grouping variables (such as region or year). This function helps streamline repetitive survey analysis tasks. It takes a dataframe, a column name for the indicator of interest, and grouping variables (like adm1 or year).
Step 2: Extract Children with Recent (Malaria) Fever
Now that we have defined our helper functions, we will start by importing the dataset, then extracting the denominator for measuring treatment-seeking: the children with recent malaria fever. To estimate the care-seeking rates, we evaluate two options for determining the denominator. The first option includes all children who had fever in the two weeks preceding the survey, assuming all reported fevers are attributable to malaria. This approach ensures a larger sample size and applies to all survey types.
The second option narrows the denominator to children who reported fever and tested positive by RDT, focusing on fevers likely caused by malaria. While this method increases diagnostic specificity, it reduces the sample size since some recently-febrile children were not RDT-postive at time of survey. Furthermore, this approach is limited to surveys that include diagnostic testing.
We will show below how to execute each of these two options, then carry forward the first option for the rest of the code.
Consult SNT team
Discuss with the SNT team which denominator estimation option makes the most sense for the country context and availability of survey data, including diagnostic testing. Since one of the considerations may be sample size, we recommend you proceed through Steps 2.1 and 2.2 to first count the number of sampled children in each option and bring this information to the discussion.
Step 2.1: Select the children under 5 with fever in the 2 weeks preceding the survey
We begin by identifying the children under 5 who had a fever in the two weeks preceding the survey.
We read the KR file from the working directory, select the relevant variables, and create new admin-1 level variables to account for variations in naming across surveys. We assign statistical weights (wt) using the v005 variable (divided by one million), define the stratification (strate) and cluster identifiers (id), and create the ml_fever variable to flag children with fever (1) or without (0). Finally, we filter the dataset to include only living children (b5==1) under the age of five (age < 60 months).
As an output of this step, we generate a summary table showing the number of children with fever by region (admin-1), which will be used in later analysis and mapping.
# Import and process DHS KR files for 2013 and 2019mis_kr_2016 <-readRDS(here::here("1.6_health_systems/1.6a_dhs/raw/SLKR73FL.rds")) |> haven::zap_labels() |># remove labels to make merging easier dplyr::select( caseid, v001, v002, v003, v007, v012, v022, v021, v024, v101, v005, b5, b8, b19, h22, h32a:h32y )dhs_kr_2019 <-readRDS(here::here("1.6_health_systems/1.6a_dhs/raw/SLKR7AFL.rds")) |> haven::zap_labels() |># remove labels to make merging easier dplyr::select( caseid, v001, v002, v003, v007, v012, v022, v021, v024, v101, v005, b5, b8, b19, h22, h32a:h32y )# Combine both survey roundsdhs_kr <- dplyr::bind_rows(mis_kr_2016, dhs_kr_2019)# Process each dataset in the listdhs <- dhs_kr |># Create helper variables: admin-1, weights, strata,# cluster ID, year, fever indicator dplyr::mutate(adm1 =ifelse(is.na(v024), v101, v024),wt = v005 /1000000, strate = v022, id = v021,year = v007,ml_fever =ifelse(h22 ==1, 1, 0)) |># Recode admin-1 region codes to consistent readable region names (year-dependent) dplyr::mutate(adm1 = dplyr::case_when( adm1 ==1~"eastern", adm1 ==2~"northern", adm1 ==3& year ==2019~"north western", adm1 ==3& year ==2016~"southern", adm1 ==4& year ==2019~"southern", adm1 ==4& year ==2016~"western", adm1 ==5~"western",TRUE~NA_character_ )) |># Filter to include only living children under age 5 with valid fever responses dplyr::filter( b5 ==1& b19 <60& h22 !=8)# File with only children with feverdhs_fever <- dhs |> dplyr::filter(ml_fever ==1)# Filter to fever cases and summarize by admin-1data_output = dhs |> dplyr::filter(ml_fever ==1) |> dplyr::group_by(adm1) |> dplyr::summarise(ml_fever =sum(ml_fever, na.rm = T))# Display summary table with number of fevers by admin-1knitr::kable(data_output, caption ="Number of fevers by admin-1 level")
Output
Number of fevers by admin-1 level
adm1
ml_fever
eastern
708
north western
277
northern
883
southern
923
western
324
To adapt the code:
Lines 4 & 9: Replace “SLKR61FL.rds” and “SLKR7AFL.rds” with your country’s .rds files (e.g. “NGKR7HFL.rds” for Nigeria 2018).
Lines 4 & 10: Keep haven::zap_labels() to avoid label conflicts when merging across years.
Lines 6–8 & 11–13: Ensure the selected variables (b5, b19, h22, h32a:h32y) are present and named consistently across years.
Lines 17–25: Update adm1 region mapping to reflect your country’s region codes and how they change across survey years.
Line 28: Confirm filtering criteria (b5 == 1, b19 < 60, h22 != 8) match the coding logic of your dataset.
In most cases, this script will run as-is for any DHS/MIS KR file with standard variable naming and regional coding.
Step 2.2: Select children with fever in the past 2 weeks who are also RDT-positive
We identify children under five who had both a recent fever and a positive RDT result by combining two sources of information.
First, we use the fever data extracted in Step 2.1, which includes children who had a fever in the two weeks preceding the survey.
Next, we extract RDT data from the PR file, which contains information on household members. We filter for children aged 6–59 months who were selected for the hemoglobin test, slept in the household the night before the survey, and tested positive by RDT.
We then merge the fever and RDT-positive datasets by household and survey identifiers to create a unified table.
The code below performs these steps and ends by summarizing and displaying the number of RDT-positive recently febrile children by admin-1 level.
# Import and process MIS PR files for 2013 and 2019mis_pr_2016 <-readRDS(here::here("1.6_health_systems/1.6a_dhs/raw/SLPR73FL.rds")) |> haven::zap_labels() |># remove labels to make merging easier dplyr::select(v001 = hv001, v002 = hv002, v007 = hv007, hv103, hv042, hc1, rdt = hml35)dhs_pr_2019 <-readRDS(here::here("1.6_health_systems/1.6a_dhs/raw/SLPR7AFL.rds")) |> haven::zap_labels() |># remove labels to make merging easier dplyr::select(v001 = hv001, v002 = hv002, v007 = hv007, hv103, hv042, hc1, rdt = hml35)# Combine both survey roundsdhs_pr <- dplyr::bind_rows(mis_pr_2016, mis_pr_2016)# From the PR file, extract children under 5 who are eligible and RDT positivedata_pr <- dhs_pr |> dplyr::filter( hv042 ==1& hv103 ==1& hc1 %in%c(6:59) & rdt ==1) |> dplyr::select(-hv042, -hv103, -hc1)# Merge fever data with RDT-positive data by cluster, household, and survey yeardata_pr_kr <- dhs_fever |> dplyr::left_join( data_pr, by =c("v001", "v002", "v007"),relationship ="many-to-many") |> dplyr::filter(rdt ==1) |> dplyr::select(-dplyr::starts_with(".id"))# Group by admin-1 and sum number of RDT-positive childrendata_output_rdt = data_pr_kr |> dplyr::group_by(adm1) |> dplyr::summarise(rdt =sum(rdt, na.rm = T))# Display the result in a formatted tableknitr::kable(data_output_rdt,caption ="Number of RDT-positive fevers by admin-1")
Output
Number of RDT-positive fevers by admin-1
adm1
rdt
eastern
356
northern
805
southern
542
western
113
To adapt the code:
Line 28: Delete the keep function if you are sure that all your surveys include information on RDT+
Step 3: Assign Recent Fevers to Source of Treatment
In this step, we apply Option 1 for the recent fever denominator (all recent fevers) to classify care-seeking among febrile children by identifying whether caregivers sought care or advice from public or private health sectors, or did not seek care. If your SNT team chose to use Option 2 (RDT-positive recent fevers only), refer to the section To adapt the code below for guidance on modifying the code for Option 2.
The code below calculates the number of care sources visited in each sector (public, private, or no care) to classify each child’s care-seeking. Because a child may access multiple sources within a sector, the counts can exceed one. To standardize this, the code converts each sector’s count to a binary indicator (1 if care was sought, 0 otherwise). If a child accessed both public and private care, the code classifies the child under the public sector.
It assigns each child’s care-seeking behavior to a sector (public or private) based on the guidelines from the DHS-8 Guide to DHS Statistics. To understand how each variable contributes to the public and private, you can refer to the official DHS Guide:
Type fever in the search bar
Click on Prevalence, Diagnosis, and Prompt Treatment of Children with Fever
Scroll down to view the variable definitions
Consult SNT team
Each country has specific characteristics, so consult the SNT team to confirm which sources should be included. For example, some surveys exclude traditional practitioners (h32t), pharmacies (h32k), stores (h32s), or other sources.
This code assumes that if a child sought care in multiple sectors (e.g., both private and public), we count the care as being in the public sector. This approach ensures that the sum of the percentages (public + private + no_treat) equals 1.Otherwise,the total may exceed 1. Verify this approach with your SNT team.
dhs_care_seeking <- dhs |># Filter by key factors (child had a fever in last two weeks: h22) dplyr::filter(ml_fever ==1) |># Assigning each child to public, private and no care-seeking dplyr::mutate(ml_fever_pub =rowSums( dplyr::across(c( h32a, h32b, h32c, h32d, h32e, h32f, h32g, h32h, h32i )), na.rm =TRUE),ml_fever_priv =rowSums( dplyr::across(c( h32j, h32k, h32l, h32m, h32n, h32o, h32p, h32q, h32r, h32s, h32t, h32u, h32v, h32w, h32x )), na.rm =TRUE),pub =ifelse(ml_fever_pub >0, 1, 0),priv =ifelse(ml_fever_priv >0, 1, 0),pub =ifelse(pub ==1& priv ==1, 1, pub),no_treat =ifelse(h32y ==0, 0, 1) )
To adapt the code:
Line 1: If option 2 is chosen by the SNT team, replace the dataset dhs with data_pr_kr from Step 2.2
Line 10-11 and 17-19: Review the variable definitions in ml_fever_pub and ml_fever_priv on the DHS website. Remove any variables that don’t reflect your country’s classification of public or private care, and confirm the variable names match the survey
Line 22: Comment out or revise this line if your team decides not to count children who sought care in both sectors as public only
Step 4: Estimate Treatment-Seeking Rates by Sector at Admin-1 Level
We have created the indicators pub, priv and no_treat in Step 3 to classify care-seeking for each individual child. Now we will calculate the proportion of children who sought care in the private sector, in the public sector and those who did not seek care at all, at admin-1 level.
When working with survey data, remember that we need to apply weights to ensure the data accurately represent the population, especially when participants are selected with different probabilities as in cluster or stratified surveys.
Although several R functions can apply weights, most cannot account for the full complexity of the sampling design. For this reason, it is best to use the survey package, which is specifically designed to handle complex survey designs and weighting, while also supporting simpler weighting applications. Another package, srvyr, works with survey objects and is similar to dplyr for data arrays by extending dplyr’s verbs to handle complex sampling plans. In this example, we use the survey package.
The code below uses the table of individual children fever and their care-seeking outcomes from Step 3, stratified by admin1 and survey year, and calculates the proportion of children seeking care in each sector (public, private, or not at all). Using the estim_prop function we defined in Step 1.2, it applies survey weights and handles survey design to ensure valid estimates.
# Define indicatorsvars <-c("pub", "priv", "no_treat")# Handle single PSU strataoptions(survey.lonely.psu ="adjust")# Process each dataset in dhs filedata_fever_care <- dhs_care_seeking |># Add a year column dplyr::mutate(year = v007) |># Compute proportions for each indicatorestim_prop(col = vars, by =c("adm1", "year")) |># rename columns, and format dplyr::rename(adm1 =1, year =2) |> dplyr::select(adm1, year, pub, priv, no_treat) |> dplyr::mutate(pub =round(pub, 2),priv =round(priv, 2),no_treat =round(no_treat, 2),year =as.numeric(year) )# Reset row namerownames(data_fever_care) <-NULL
To adapt the code:
Line 8: If you choose option 2, modify dhs_care_seeking name to match the one created in Step 3
Step 5: Visualize Treatment-Seeking Rates by Sector
We are now ready to visualize treatment-seeking rates by sector. To start, we’ll import the shapefile for the corresponding administrative level (admin-1 in this case). For guidance on working with shapefiles, refer to the Shapefiles page.
We will use ggplot2 to create visualizations. Another option is tmap.
Show the code
# Import the shapefile using st_readshp <-readRDS( here::here("01_foundational/1a_administrative_boundaries","1ai_adm3", "sle_adm1_shp.rds")) |> sf::st_as_sf()# Pivot the table in long format to have all indicators in a single columndata_care_shp <- data_fever_care |> tidyr::pivot_longer(cols =c("pub", "priv", "no_treat"),names_to ="Sectors",values_to ="values") |> dplyr::mutate(Sectors = dplyr::recode(Sectors,"pub"="Public care seeking","priv"="Private care seeking","no_treat"="No care seeking" ) )# Merge shapefile with care-seeking datedhs_shp_region <- shp |> dplyr::left_join(data_care_shp, by =c("adm1"),relationship ="many-to-many")# Create a map to viualize the indicatorsg <- dhs_shp_region |> dplyr::filter(year ==2019) |> ggplot2::ggplot() + ggplot2::geom_sf(aes(fill = values), color ="black", size =0.2) + ggplot2::facet_wrap(~ Sectors, nrow =1) + ggplot2::scale_fill_distiller(palette ="RdYlBu",limits =c(0, 1),breaks =seq(0, 1, by =0.1),direction =1,na.value ="transparent",name =NULL ) + ggplot2::theme_void() + ggplot2::theme(legend.position ="right",legend.title = ggplot2::element_text(size =10),legend.text = ggplot2::element_text(size =8),panel.grid = ggplot2::element_blank(),strip.background = ggplot2::element_blank(),strip.text = ggplot2::element_text(face ="bold", size =11, margin = ggplot2::margin(b =10)),strip.placement ="outside",plot.title = ggplot2::element_text(hjust =0.5, size =14) ) + ggplot2::labs(title ="Proportion of recent fevers seeking care in each sector\n\n")# plotg# Save to PNGggplot2::ggsave(filename = here::here("03_output/3a_figures","prop_recent_fevers_seeking_care_in_each_sector.png"),plot = g, width =10, height =6, dpi =300)
Output
To adapt the code:
Line 2: Replace this shapefile with your country’s shapefile and the chosen administrative level
Line 27: Select the year you wish to view
Validate with the SNT team
The SNT team needs to review the output maps to ensure that the care-seeking estimates align with local realities and programmatic knowledge. Even though the estimates are based on survey data, this method involves several approximations, for example:
The same care-seeking value is applied to all districts within an admin-1, which may mask important sub-regional variation in access to care.
DHS/MIS surveys may not capture all relevant sources of care (e.g., informal providers or faith-based facilities), and their classification may differ across surveys or countries.
Assumptions like counting a child who sought both public and private care only under the public sector affect the final proportions and need contextual review.
The definition of fever or malaria fever is uncertain and subject to bias.
These factors can combine to create outputs that are inconsistent with country experience and field knowledge. Given the uncertainties and approximations made during the calculations above, the SNT team may decide to modify the outputs of the treatment-seeking analysis:
For example, during the validation process, one country’s SNT team identified that the proportion of care-seeking in the private sector was too high. They asked the analysis team to reduce this value by dividing the private sector proportion by a factor of 2. The analysis team then regenerated the maps to reflect the adjustment and used the adjusted values in later SNT steps.
Step 6: Estimate Values for Years Without Surveys
We calculated the proportion of children under five who had fever in the two weeks preceding each survey and who sought care in the public or private sectors, or did not seek any care, for the years with available survey data. To estimate values for years without surveys, we present two approaches:
Use data from a given survey year until a new survey becomes available, or apply the most recent survey data available
Linear interpolation: This method estimates values between two known data points by assuming a straight line between them
Other approaches are also possible. For example, it may be a good idea to use contextual knowledge of when policy changes occurred to inform when changes in treatment-seeking rate should also occur.
Since Sierra Leone changed its admin-1 units between survey years, we will instead use Guinea’s survey data from 2018 and 2021 in this example exercise. We will limit the analysis to children who had fever in the two weeks preceding the survey, without considering their RDT test result.
The dataset generated in Step 5, applied to Guinea’s surveys, is used as input in this step.
Consult SNT team
Ask the SNT team which method they prefer for estimating treatment-seeking rates in years without survey data. Each method has trade-offs. Using data from a given survey year until a new survey becomes available assumes treatment-seeking rates remain constant between surveys. The linear interpolation method assumes a gradual change over time rather than abrupt shifts. The SNT team should choose based on how they believe care-seeking behavior changed between survey years.
Step 6.1: Use Data from the Most Recent Survey Until a New One Becomes Available
This code imports treatment-seeking data by admin-1 and year, fills in missing years (e.g., 2019, 2020, 2022, 2023) by carrying forward the most recent available survey values, reshapes the data into long format to organize indicators by sector, and generates line plots to visualize trends in the public sector, private sector, and no care-seeking both nationally and for a specific admin-1.
# Import Guinea treatment seeking rate filedata_inter <-read.csv(here::here("1.6_health_systems/misc/GN_treatment_seeking.csv"))# Define the target years you want in the final datasettarget_years <-c(2018, 2019, 2020, 2021, 2022, 2023)# Expand the dataset for all admin × year combinationsdf_full <-expand.grid(adm1 =unique(data_inter$adm1),year = target_years) |> dplyr::left_join(data_inter, by =c("adm1", "year")) |> dplyr::arrange(adm1, year)# Fill missing values using last observation carried forward within each admindf_filled <- df_full |> dplyr::group_by(adm1) |> tidyr::fill(pub, priv, no_treat, .direction ="down") |> dplyr::ungroup()# Pivot the table in long format to have all indicators in a single columnmedfever_dhs_fitted_all_long <- df_filled |> tidyr::pivot_longer(cols =c("pub", "priv", "no_treat"),names_to ="Sectors",values_to ="Proportion" ) |> dplyr::mutate(Sectors = dplyr::recode(Sectors,"pub"="Public care seeking","priv"="Private care seeking","no_treat"="No care seeking" ) )# Visualizing indicators using ggplot2p <- medfever_dhs_fitted_all_long |>ggplot2::ggplot( ggplot2::aes(x = year, y = Proportion,group = adm1, color =factor(adm1))) + ggplot2::geom_line(linewidth =1.2, lineend ="round") + ggplot2::scale_color_brewer(palette ="Spectral") + ggplot2::facet_wrap(~ Sectors, ncol =1) + ggplot2::theme_minimal() + ggplot2::labs(title ="Care Seeking Trends (2018-2023)\n",x ="Year",y ="Proportion",color ="Admin-1") + ggplot2::theme(text = ggplot2::element_text(family ="sans"),plot.title = ggplot2::element_text(hjust =0.5, face ="bold", size =16),strip.text = ggplot2::element_text(face ="bold", size =12),panel.grid.minor = ggplot2::element_blank(),panel.border = ggplot2::element_rect(color ="black", fill =NA, size =0.5),axis.title = ggplot2::element_text(size =12, face ="bold"),legend.position ="bottom" )# Print outputp# Visualizing indicators using ggplot2p1 <- medfever_dhs_fitted_all_long |>dplyr::filter(adm1 =="Boke") |>ggplot2::ggplot( ggplot2::aes(x = year, y = Proportion, color = Sectors)) + ggplot2::geom_line(size =1.2, lineend ="round") + ggplot2::scale_color_manual(values =c("No care seeking"="#E41A1C","Private care seeking"="#4DAF4A","Public care seeking"="#377EB8")) + ggplot2::facet_wrap(~ Sectors, ncol =1) + ggplot2::theme_minimal() + ggplot2::labs(title ="Care Seeking Trends for Boke (2018-2023)",x ="Year",y ="Proportion",color ="Sectors") + ggplot2::theme(text = ggplot2::element_text(family ="sans"),plot.title = ggplot2::element_text(hjust =0.5, face ="bold", size =16),strip.text = ggplot2::element_text(face ="bold", size =12),panel.grid.minor = ggplot2::element_blank(),panel.border = ggplot2::element_rect(color ="black", fill =NA, size =0.5),axis.title = ggplot2::element_text(size =12, face ="bold"),legend.position ="bottom" )# Print outputp1# Save to PNGggplot2::ggsave(filename = here::here("03_output/3a_figures","care_seeking_trends_2018-2023.png"),plot = p, width =10, height =6, dpi =300)# Save to PNGggplot2::ggsave(filename = here::here("03_output/3a_figures","boke_care_seeking_trends_2018-2023.png"),plot = p1, width =10, height =6, dpi =300)
Output
To adapt the code:
Line 2: Change the name of the file to be used
Line 5: Add or reduce years according to your years without surveys
Step 6.2: Linear interpolation
The approx() function performs linear interpolation, which means it fills in values between two known data points by drawing a straight line between them. This is useful when you want to estimate values for years without survey data, based only on the years where surveys were conducted.
For example, if a survey was done in 2018 and another in 2021, approx() can estimate the values for 2019 and 2020 by assuming that the change between 2018 and 2021 happened at a constant rate. However, it needs at least two survey points to do this. Otherwise, it can’t draw a line. Also, it won’t estimate values outside the survey years unless you explicitly allow extrapolation.
This approach is helpful when you want a simple and transparent way to fill in gaps without making assumptions about longer-term trends.
# Define target years for interpolationtarget_years <-c(2019, 2020, 2022, 2023)# Initialize an empty list to store interpolated resultsinterpolated_data <-list()# Loop through each admin-1 (adm1)for (reg inunique(data_inter$adm1)) {# Subset data for the current region reg_data <- data_inter[data_inter$adm1 == reg, ]# Interpolate each variable (pub, priv, no.treat) using approx() interp_pub <-approx(reg_data$year, reg_data$pub,xout = target_years, rule =2)$y interp_priv <-approx(reg_data$year, reg_data$priv,xout = target_years, rule =2)$y interp_notrt <-approx(reg_data$year, reg_data$no_treat,xout = target_years, rule =2)$y# Combine into a data frame for this region reg_result <-data.frame(adm1 = reg,year = target_years,pub =round(interp_pub, 2),priv =round(interp_priv, 2),no_treat =round(interp_notrt, 2) )# Append to the list interpolated_data[[reg]] <- reg_result}# Combine all regions into a single data framemedfever_dhs_interp_finale <- dplyr::bind_rows(interpolated_data) |># bind all years in one file dplyr::bind_rows(data_inter)# Pivot the table in long format to have all indicators in a single columnmedfever_dhs_inter_long <- medfever_dhs_interp_finale |> tidyr::pivot_longer(cols =c("pub", "priv", "no_treat"),names_to ="Sectors",values_to ="Proportion") |> dplyr::mutate(Sectors = dplyr::recode(Sectors,"pub"="Public care seeking","priv"="Private care seeking","no_treat"="No care seeking" ) )# Visualizing indicators using ggplot2p2 <- medfever_dhs_inter_long |> ggplot2::ggplot( ggplot2::aes(x = year, y = Proportion,group = adm1, color =factor(adm1))) + ggplot2::geom_line(size =1.2, lineend ="round") + ggplot2::scale_color_brewer(palette ="Spectral") + ggplot2::facet_wrap(~ Sectors, ncol =1) + ggplot2::theme_minimal() + ggplot2::labs(title ="Care Seeking Trends (2018-2023)",x ="Year",y ="Proportion",color ="Admin-1") + ggplot2::theme(text = ggplot2::element_text(family ="sans"),plot.title = ggplot2::element_text(hjust =0.5, face ="bold", size =16),strip.text = ggplot2::element_text(face ="bold", size =12),panel.grid.minor = ggplot2::element_blank(),panel.border = ggplot2::element_rect(color ="black", fill =NA, size =0.5),axis.title = ggplot2::element_text(size =12, face ="bold"),legend.position ="bottom" )# Print outputp2# Save to PNGggplot2::ggsave(filename = here::here("03_output/3a_figures","care_seeking_trends_2018-2023.png"),plot = p, width =10, height =6, dpi =300)
Output
To adapt the code:
Line 2: Add or reduce years according to your years without surveys
Line 13,17,20: Delete rule=2 if you don’t include any lines beyond the survey points and don’t require extrapolation
Step 7: Save the Output
We have now estimated the three indicators at the admin-1 level and will save the files in the appropriate format for later use, such as for epidemiological stratification.
Make sure you check your save_path for the location of your output CSV file.
We chose to save all CSV outputs (no estimates for years between surveys, keeping the same value until the next survey, and linear interpolation) in this directory to include estimation files for years without surveys.
You may opt to save your CSV files earlier, depending on what is most convenient for your workflow.
# set up output pathsave_path <- here::here("1.6_health_systems/1.6a_dhs/processed")# save to csvrio::export(data_fever_care, here::here(save_path, "care_seeking_data.csv"))# save keep until next survey year to csvrio::export(medfever_dhs_fitted_all_long, here::here(save_path, "medfever_dhs_linear.csv"))# save linear interpolation to csvrio::export(medfever_dhs_inter_long, here::here(save_path, "medfever_dhs_interp.csv"))
To adapt the code:
Line 2: Update the filename path to match your folder structure
Line 6 onwards: You can modify your output extension type, for example writexl::write_xlsx() for xlsx
Summary
We calculated treatment-seeking rates for children who sought care from public and private sector sources, as well as those who did not seek care. We performed this analysis under two options: children with reported fever in the two weeks preceding the survey, and children with reported fever who also tested positive via RDT. The SNT team is advised to select the preferred scenario for subsequent analysis.
We estimated values for non-survey years using two methodological approaches: use data from a given survey year until a new survey becomes available and linear interpolation. We recommend that the SNT team review and determine the most appropriate method for imputing these missing data points.
The full code, reduced for ease of reference, is provided at the bottom of this section. You can reuse it as is or modify it for your configuration by updating the file paths and survey type.