Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 3. Stratification
  2. 3.1 Epidemiological Stratification
  3. Risk categorization REMOVE?
  • Code library for subnational tailoring
    English version
  • 1. Getting Started
    • 1.1 About and Contact Information
    • 1.2 For Everyone
    • 1.3 For the SNT Team
    • 1.4 For Analysts
    • 1.5 Producing High-Quality Outputs
  • 2. Data Assembly and Management
    • 2.1 Working with Shapefiles
      • Spatial data overview
      • Basic shapefile use and visualization
      • Shapefile management and customization
      • Merging shapefiles with tabular data
    • 2.2 Health Facilities Data
      • Fuzzy matching of names across datasets
      • Health facility coordinates and point data
    • 2.3 Routine Surveillance Data
      • Routine data extraction
      • DHIS2 data preprocessing
      • Determining active and inactive status
      • Contextual considerations
      • Missing data detection methods
      • Health facility reporting rate
      • Data coherency checks
      • Outlier detection methods
      • Imputation methods
      • Final database
    • 2.4 Stock Data
      • LMIS
    • 2.5 Population Data
      • National population data
      • WorldPop population raster
    • 2.6 National Household Survey Data
      • DHS data overview and preparation
      • Prevalence of malaria infection
      • All-cause child mortality
      • Treatment-seeking rates
      • ITN ownership, access, and usage
      • Wealth quintiles analysis
    • 2.7 Entomological Data
      • Entomological data
    • 2.8 Climate and Environmental Data
      • Climate and environment data extraction from raster
    • 2.9 Modeled Data
      • Generating spatial modeled estimates
      • Working with geospatial model estimates
      • Modeled estimates of malaria mortality and proxies
      • Modeled estimates of entomological indicators
  • 3. Stratification
    • 3.1 Epidemiological Stratification
      • Incidence overview and crude incidence
      • Incidence adjustment 1: incomplete testing
      • Incidence adjustment 2: incomplete reporting
      • Incidence adjustment 3: treatment-seeking
      • Incidence stratification
      • Prevalence and mortality stratification
      • Combined risk categorization
      • Risk categorization REMOVE?
      • Risk categorization REMOVE?
    • 3.2 Stratification of Determinants of Malaria Transmission
      • Seasonality
      • Access to Care
  • 4. Review of Past Interventions
    • 4.1 Case Management
    • 4.2 Routine Interventions
    • 4.3 Campaign Interventions
    • 4.4 Other Interventions
  • 5. Targeting of Interventions
  • 6. Retrospective Analysis
    • 6.1: Trend analysis
  • 7. Urban Microstratification

On this page

  • Overview
  • Step-by-Step Instructions
    • Step 1
  • Full code
  • Output
  1. 3. Stratification
  2. 3.1 Epidemiological Stratification
  3. Risk categorization REMOVE?

Risk categorization REMOVE?

Overview

Given the uncertainties of the three common metrics of transmission and disease burden (incidence, prevalence and all-cause mortality), and the different dimensions of malaria transmission that they represent, countries may choose to combine prevalence, incidence, and or mortality categories to develop a composite risk map. There are several approaches that can be applied to develop a composite metric. A simple two-stage approach is presented here, but should be adapted to country context.

Step 1: Scores are assigned in ascending order to the prevalence and incidence categories based on the number of strata used per metric. For example, for prevalence, scores of 1, 2, 3, 4, 5, 6 for a prevalence of <1, 1-5%, 5-10%, 10-35%, 35-50% or >50% respectively; for incidence, scores of 1 to 7 for <1, 1-50, 50-100, 100-250, 250-500, 500-750 and >750 per 1000 people at risk per year. These scores are then summed per operational unit, and the sum of the scores is reclassified in quartiles to obtain areas of “Lowest”, “Moderate”, “High” and “Very high” morbidity as per incidence and prevalence.

Objectives
  • TBD

Step-by-Step Instructions

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1

first calculate adjusted cases1. we apply the observed tpr to the presumed cases to derive additional number of cases if all presumed cases were confirmed

  • R
  • Next we combine incidence and prevalence to generate first risk stratification
  • Python
  • Stata

Loading required libraries

library(dplyr)
library(purrr)

Combining all 4 incidence estimates into 1 dataframe

# Use crude incidence dataset and add the 3 other incidence adjustments
all_inc <- crude_inc %>%
      left_join(adj_inc1, by = c("adm1", "adm2", "adm3", "year")) %>%
      left_join(adj_inc2, by = c("adm1", "adm2", "adm3", "year")) %>%
      left_join(adj_inc3, by = c("adm1", "adm2", "adm3", "year"))

# Now create categories using cut_off values for each incidence
# Create an object for the incidence variables

vars <- c("crudeinc", "adjinc1", "adjinc2", "adjinc3")

# Define breaks and labels for the categories
cut_offs <- c(-Inf, 5, 50, 100, 250, 450, 750, Inf)
cut_lab <- c("<5", "5-50", "50-100", "100-250", "250-450", "450-750", ">750")

# create numerical values for the categories
all_inc <- all_inc %>%
  mutate(across(all_of(vars), ~ cut(., breaks = cut_offs,
                                             labels = cut_lab),
                 .names = "{.col}_cat"))

# Categorization of prevalence estimates. This assumes we have prevalence estimates for multiple years, just as we have for incidence

vars_prev <- c("prev_y1", "prev_y2", "prev_y3")

# Define breaks and labels for prevalence estimates
cut_offs <- c(-Inf, 1, 5, 10, 20, 35, 50, Inf)
cut_lab <- c("<1", "1-5", "5-10", "10-20", "20-35", "35-50", ">50")

# create numerical values for the categories
prev_df <- prev_df %>%
  mutate(across(all_of(vars_prev), ~ cut(., breaks = cut_offs,
                                                   labels = cut_lab),
                 .names = "{.col}_cat"))

Step 2: Once the first set of strata based on prevalence and incidence scores has been obtained, new scores are assigned to them from 1 (low) to 4 (high). At this stage, the categories for all-cause mortality in children under 5 are also scored in ascending order of 1, 2, 3 or 4 for mortality <1, 1-6, 6-9.5, and >9.5 deaths per 1000 live births. These mortality categories are the ones used in the Africa regional map for the malaria vaccine allocation (66) but can be changed depending on context. The mortality score is then added to the combined prevalence and incidence score obtained in stage 1, and the sum of the scores are reclassified in quartiles to obtain areas of “Lowest”, “Moderate”, “High” and “Very high” morbidity and mortality

# First combine incidence and prevalence datasets and filter for most current year
year = "most current year" # place holder for most current year

inc_prev <- all_inc %>%
      left_join(prev_df, by = c("adm1", "adm2", "adm3", "year")) %>%
  dplyr::filter(year == year)

# Conducting first risk stratification for the most current year by summing strata values in inc and prev
# let's get the risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)

inc_prev <- inc_prev %>%
  mutate(sumcat_crude = crudeinc_cat + prev_y3_cat,
         sumcat_adj1 = adjinc1_cat + prev_y3_cat,
         sumcat_adj2 = adjinc2_cat + prev_y3_cat,
         sumcat_adj3 = adjinc3_cat + prev_y3_cat)

# Next we will recode the values of the sumcat variables into meaningful thresholds based on the range of values from the sumcat variables. The figures provided here are for illustrative purposes

inc_prev <- inc_prev %>%
  mutate( # generating recode values for crude estimates
    sumcat_crude_rec = case_match(
      sumcat_crude %in% 6:7 ~ 1,
      sumcat_crude == 8 ~ 2,
      sumcat_crude == 9 ~ 3,
      sumcat_crude == 10 ~ 4,
      sumcat_crude %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),
    # generating recode values for adjusted 1 estimates
    sumcat_adj1_rec = case_when(
      sumcat_adj1 %in% 6:7 ~ 1,
      sumcat_adj1 == 8 ~ 2,
      sumcat_adj1 == 9 ~ 3,
      sumcat_adj1 == 10 ~ 4,
      sumcat_adj1 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),
    # generating recode values for adjusted 2 estimates
    sumcat_adj2_rec = case_when(
      sumcat_adj2 %in% 6:7 ~ 1,
      sumcat_adj2 == 8 ~ 2,
      sumcat_adj2 == 9 ~ 3,
      sumcat_adj2 == 10 ~ 4,
      sumcat_adj2 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),

    # generating recode values for adjusted 3 estimates
    sumcat_adj3_rec = case_when(
      sumcat_adj3 %in% 6:7 ~ 1,
      sumcat_adj3 == 8 ~ 2,
      sumcat_adj3 == 9 ~ 3,
      sumcat_adj3 == 10 ~ 4,
      sumcat_adj3 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    )
  )

# define labels and apply it to the recoded values
# define labels
labels <- c(
  "1" = "Very low",
  "2" = "Low",
  "3" = "Moderate",
  "4" = "High",
  "5" = "Very High"
)

# apply to the recoded values
inc_prev <- inc_prev %>%
  mutate(
    sumcat_crude_rec = factor(sumcat_crude_rec, levels = 1:5, labels = labels),
    sumcat_adj1_rec  = factor(sumcat_adj1_rec, levels = 1:5, labels = labels),
    sumcat_adj2_rec  = factor(sumcat_adj2_rec, levels = 1:5, labels = labels),
    sumcat_adj3_rec  = factor(sumcat_adj3_rec, levels = 1:5, labels = labels)
  )

Next steps of codes plot each of the risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.

this first set of codes plots each map separately

# first join the final dataset with the adm3 shapefile

inc_prev_shp <- adm3_sf %>%
  left_join(inc_prev, by = c("adm1", "adm2", "adm3"))

# Plot for each of the sum cat variables
# Map for Crude incidence stratification

ggplot(inc_prev_shp) +
  geom_sf(aes(fill = sumcat_crude_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Crudeinc",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

# Map for Adjusted incidence1 stratification

ggplot(inc_prev_shp) +
  geom_sf(aes(fill = sumcat_adj1_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc1",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )


# Map for Adjusted incidence 2 stratification

ggplot(inc_prev_shp) +
  geom_sf(aes(fill = sumcat_adj2_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc2",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )


# Map for Adjusted incidence 3 stratification

ggplot(inc_prev_shp) +
  geom_sf(aes(fill = sumcat_adj3_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc3",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

The alternative will be to show the maps on a facet grid

# first reshape data into long format
inc_prev_shp_long <- inc_prev_shp %>%
  pivot_longer(
    cols = c(sumcat_crude_rec, sumcat_adj1_rec, sumcat_adj2_rec, sumcat_adj3_rec),
    names_to = "indicator",
    values_to = "strata"
  ) %>%
  mutate(indicator = recode(indicator,
                            sumcat_crude_rec = "Prev+Crude",
                            sumcat_adj1_rec = "Prev+Adjinc1",
                            sumcat_adj2_rec = "Prev+Adjinc2",
                            sumcat_adj3_rec = "Prev+Adjinc3"))

# Next we plot all maps with facets
ggplot(inc_prev_shp_long) +
  geom_sf(aes(fill = strata), color = "gray80", size = 0.2) +
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) +
  scale_fill_manual(
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    name = "Strata",
    drop = FALSE
  ) +
  facet_wrap(~indicator) +
  theme_minimal() +
  labs(
    title = "Strata by Indicator (Prevalence + Incidence)",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )

Full code

  • R
  • Python
  • Stata

Output

  • R
  • Python
  • Stata
 

©2025 Applied Health Analytics for Delivery and Innovation. All rights reserved