Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 3. Stratification
  2. 3.1 Epidemiological Stratification
  3. Combined risk categorization
  • Code library for subnational tailoring
    English version
  • 1. Getting Started
    • 1.1 About and Contact Information
    • 1.2 For Everyone
    • 1.3 For the SNT Team
    • 1.4 For Analysts
    • 1.5 Producing High-Quality Outputs
  • 2. Data Assembly and Management
    • 2.1 Working with Shapefiles
      • Spatial data overview
      • Basic shapefile use and visualization
      • Shapefile management and customization
      • Merging shapefiles with tabular data
    • 2.2 Health Facilities Data
      • Fuzzy matching of names across datasets
      • Health facility coordinates and point data
    • 2.3 Routine Surveillance Data
      • Routine data extraction
      • DHIS2 data preprocessing
      • Determining active and inactive status
      • Contextual considerations
      • Missing data detection methods
      • Health facility reporting rate
      • Data coherency checks
      • Outlier detection methods
      • Imputation methods
      • Final database
    • 2.4 Stock Data
      • LMIS
    • 2.5 Population Data
      • National population data
      • WorldPop population raster
    • 2.6 National Household Survey Data
      • DHS data overview and preparation
      • Prevalence of malaria infection
      • All-cause child mortality
      • Treatment-seeking rates
      • ITN ownership, access, and usage
      • Wealth quintiles analysis
    • 2.7 Entomological Data
      • Entomological data
    • 2.8 Climate and Environmental Data
      • Climate and environment data extraction from raster
    • 2.9 Modeled Data
      • Generating spatial modeled estimates
      • Working with geospatial model estimates
      • Modeled estimates of malaria mortality and proxies
      • Modeled estimates of entomological indicators
  • 3. Stratification
    • 3.1 Epidemiological Stratification
      • Incidence overview and crude incidence
      • Incidence adjustment 1: incomplete testing
      • Incidence adjustment 2: incomplete reporting
      • Incidence adjustment 3: treatment-seeking
      • Incidence stratification
      • Prevalence and mortality stratification
      • Combined risk categorization
      • Risk categorization REMOVE?
      • Risk categorization REMOVE?
    • 3.2 Stratification of Determinants of Malaria Transmission
      • Seasonality
      • Access to Care
  • 4. Review of Past Interventions
    • 4.1 Case Management
    • 4.2 Routine Interventions
    • 4.3 Campaign Interventions
    • 4.4 Other Interventions
  • 5. Targeting of Interventions
  • 6. Retrospective Analysis
    • 6.1: Trend analysis
  • 7. Urban Microstratification

On this page

  • Overview
  • Step-by-Step Instructions
    • Step 1: Create categories for prevalence data
    • Step 1.1: Join prevalence data to incidence data
    • Step 2: Create numerical values for prevalence and incidence categories
    • Step 3: We sum the scores for incidence and prevalence to generate first risk stratification
    • Step 3: Create scores for mortality data and add join to incidence-prevalence data
    • Step 3: Rescore the scores for first strata and add to scores for mortality to generate final risk stratification
    • Step 4: Mapping first and second risk stratification
    • Step 4.1: Mapping first risk stratification
    • Step 5: Mapping second risk stratification
  • Stata
  • Summary
  • Full code
  1. 3. Stratification
  2. 3.1 Epidemiological Stratification
  3. Combined risk categorization

Combined risk categorization

Overview

Given the uncertainties of the three common metrics of transmission and disease burden (incidence, prevalence and all-cause mortality), and the different dimensions of malaria transmission that they represent, countries may choose to combine prevalence, incidence, and or mortality categories to develop a composite risk map. There are several approaches that can be applied to develop a composite metric. A simple two-stage approach is presented here, but should be adapted to country context.

Objectives
  1. categorize all incidence estimates based on agreed country cut-offs. The cut-offs should be adapted to range of values from the data
  2. Categorize prevalence estimates based on agreed country cut-offs. The cut-offs should be adapted to range of values from the data
  3. Assign numerical values to the various categories
  4. Combine numerical values of incidence and prevalence to generate a epidemiological risk stratification.

Step-by-Step Instructions

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Create categories for prevalence data

The categories used here are WHO normative guidance but countries can revise based on their context and data

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# Categorization of prevalence estimates. This assumes we have prevalence estimates for multiple years, just as we have for incidence

vars_prev <- c("prev_y1", "prev_y2", "prev_y3") # adjust names to align with column names in the prevalence dataset

# Define breaks and labels for prevalence estimates
cut_offs <- c(-Inf, 1, 5, 10, 20, 35, 50, Inf)
cut_lab <- c("<1", "1-5", "5-10", "10-20", "20-35", "35-50", ">50")

# create numerical values for the categories
prev_df <- prev_df %>%
  mutate(across(all_of(vars_prev), ~ cut(., breaks = cut_offs,
                                                   labels = cut_lab),
                 .names = "{.col}_cat"))

Step 1.1: Join prevalence data to incidence data

Next we combine the incidence and prevalence datasets

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


inc_prev <- ann_inc_data %>%
      left_join(prev_df, ., by = c("adm1", "adm2", "adm3", "year"))

Step 2: Create numerical values for prevalence and incidence categories

Scores are assigned in ascending order to the prevalence and incidence categories based on the number of strata used per metric. For example, for prevalence, scores of 1, 2, 3, 4, 5, 6 for a prevalence of <1, 1-5%, 5-10%, 10-35%, 35-50% or >50% respectively; for incidence, scores of 1 to 7 for <1, 1-50, 50-100, 100-250, 250-500, 500-750 and >750 per 1000 people at risk per year.

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


inc_prev <- inc_prev %>%
      mutate(across(ends_with("_cat"), ~ as.numeric(factor(.)),
                    .names = "{.col}_num"))

Step 3: We sum the scores for incidence and prevalence to generate first risk stratification

The scores are then summed per operational unit, and the sum of the scores is reclassified in quartiles to obtain areas of “Lowest”, “Low”, “Moderate”, “High” and “Very high” morbidity as per incidence and prevalence.

SNT team should agree the most appropriate year to use for the stratification exercise

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


# First we filter for most current year

year = "appropriate year" # place holder for the appropriate year usually the most current year

inc_prev <- inc_prev %>%
      arrange(adm1, adm2, adm3, year) %>%
  dplyr::filter(year == year)

# Conducting first risk stratification for the most current year by summing strata values in inc and prev
# let's get the risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)

inc_prev <- inc_prev %>%
  mutate(sumcat_crude = crudeinc_cat_num + prev_y3_cat_num,
         sumcat_adj1 = adjinc1_cat_num + prev_y3_cat_num,
         sumcat_adj2 = adjinc2_cat_num + prev_y3_cat_num,
         sumcat_adj3 = adjinc3_cat_num + prev_y3_cat_num)

# Next we will recode the values of the sumcat variables into meaningful thresholds based on the range of values from the sumcat variables. The figures provided here are for illustrative purposes

inc_prev <- inc_prev %>%
  mutate( # generating recode values for crude estimates
    sumcat_crude_rec = case_match(
      sumcat_crude %in% 6:7 ~ 1,
      sumcat_crude == 8 ~ 2,
      sumcat_crude == 9 ~ 3,
      sumcat_crude == 10 ~ 4,
      sumcat_crude %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),
    # generating recode values for adjusted 1 estimates
    sumcat_adj1_rec = case_when(
      sumcat_adj1 %in% 6:7 ~ 1,
      sumcat_adj1 == 8 ~ 2,
      sumcat_adj1 == 9 ~ 3,
      sumcat_adj1 == 10 ~ 4,
      sumcat_adj1 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),
    # generating recode values for adjusted 2 estimates
    sumcat_adj2_rec = case_when(
      sumcat_adj2 %in% 6:7 ~ 1,
      sumcat_adj2 == 8 ~ 2,
      sumcat_adj2 == 9 ~ 3,
      sumcat_adj2 == 10 ~ 4,
      sumcat_adj2 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),

    # generating recode values for adjusted 3 estimates
    sumcat_adj3_rec = case_when(
      sumcat_adj3 %in% 6:7 ~ 1,
      sumcat_adj3 == 8 ~ 2,
      sumcat_adj3 == 9 ~ 3,
      sumcat_adj3 == 10 ~ 4,
      sumcat_adj3 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    )
  )

# define labels and apply it to the recoded values
# define labels
labels <- c(
  "1" = "Very low",
  "2" = "Low",
  "3" = "Moderate",
  "4" = "High",
  "5" = "Very High"
)

# apply to the recoded values
inc_prev <- inc_prev %>%
  mutate(
    sumcat_crude_rec = factor(sumcat_crude_rec, levels = 1:5, labels = labels),
    sumcat_adj1_rec  = factor(sumcat_adj1_rec, levels = 1:5, labels = labels),
    sumcat_adj2_rec  = factor(sumcat_adj2_rec, levels = 1:5, labels = labels),
    sumcat_adj3_rec  = factor(sumcat_adj3_rec, levels = 1:5, labels = labels)
  )

Step 3: Create scores for mortality data and add join to incidence-prevalence data

At this stage, we create categories for all-cause mortality in children under 5 and score in ascending order of 1, 2, 3 or 4 for mortality <1, 1-6, 6-9.5, and >9.5 deaths per 1000 live births. These mortality categories are the ones used in the Africa regional map for the malaria vaccine allocation but can be changed depending on context.

  • R
  • Python
# rescore first strata values by combining


# Create categories for mortality data and join it to strata data

u5mr_data <- u5mr_data %>%
  mutate(u5mr_cat = case_when(
    u5mr < 1 ~ 1,
    u5mr %in% 1:6 ~ 2,
    u5mr %in% 6:9.5 ~ 3,
    TRUE ~ 4
  ))

# Combine mortality data to inc_prev data

inc_prev_mort <- inc_prev %>%
  left_join(u5mr_data, ., by = c("adm1", "adm2", "adm3"))

Step 3: Rescore the scores for first strata and add to scores for mortality to generate final risk stratification

Once the first set of strata based on prevalence and incidence scores has been obtained, new scores are assigned to them from 1 (low) to 4 (high) - combine “low” and “lowest”. At this stage, the categories are already scored. The mortality score is then added to the combined prevalence and incidence score obtained in stage 1, and the sum of the scores are reclassified in quartiles to obtain areas of “Low”, “Moderate”, “High” and “Very high” morbidity and mortality

Here we assume the SNT team has agreed to use the adjustment3 stratified map as the final data for the stratification

  • R
  • Python
# rescore first strata values by combining lowest and low categories

inc_prev_mort <- inc_prev_mort %>%
  mutate(sumcat_adj3_rec_num = case_when(
    sumcat_adj3_rec %in% 1:2 ~ 1,
    sumcat_adj3_rec == 3 ~ 2,
    sumcat_adj3_rec == 4 ~ 3,
    TRUE ~ 4
  ))


# let's get the final risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)

inc_prev_mort <- inc_prev_mort %>%
  mutate(
         sumcat2 = sumcat_adj3_rec_num + u5mr_cat)

# Next we will recode the values of the sumcat2 into meaningful thresholds based on the range of values. The figures provided here are for illustrative purposes

inc_prev <- inc_prev %>%
  mutate(
    sumcat2_rec = case_match(
      sumcat2 %in% 2:3 ~ 1,
      sumcat2 %in% 4:5 ~ 2,
      sumcat2 == 6 ~ 3,
      sumcat2 == 7 ~ 4,
      sumcat2 == 9 ~ 5,
      TRUE ~ NA_real_
    ))

# define labels and apply it to the recoded values
# define labels
labels <- c(
  "1" = "Very low",
  "2" = "Low",
  "3" = "Moderate",
  "4" = "High",
  "5" = "Very High"
)

# apply to the recoded values
inc_prev_mort <- inc_prev_mort %>%
  mutate(
    sumcat2_rec = factor(sumcat2_rec, levels = 1:5, labels = labels)
  )

Step 4: Mapping first and second risk stratification

Next steps of codes plot each of the risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.

This first set of codes plots each map separately

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# first join the final dataset with the adm3 shapefile

strat1_maps <- adm3_sf %>%
  left_join(inc_prev, ., by = c("adm1", "adm2", "adm3"))

# Plot for each of the sum cat variables
# Map for Crude incidence stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_crude_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Crudeinc",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

# Map for Adjusted incidence1 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj1_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc1",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )


# Map for Adjusted incidence 2 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj2_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc2",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )


# Map for Adjusted incidence 3 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj3_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc3",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

Step 4.1: Mapping first risk stratification

The alternative will be to show the maps on a facet grid

  • Python

Step 5: Mapping second risk stratification

Next steps of codes plot the final risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# first join the final dataset with the adm3 shapefile

strat2_maps <- adm3_sf %>%
  left_join(inc_prev_mort, ., by = c("adm1", "adm2", "adm3"))

# Plot for each of the sum cat variables
# Map for Crude incidence stratification

ggplot(strat2_maps) +
  geom_sf(aes(fill = sumcat2_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Crudeinc+U5MR",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

Stata

:::

Summary

TBD

Full code

  • R
  • Python
Show full code
#===============================================================================
# End of Script
#===============================================================================
 

©2025 Applied Health Analytics for Delivery and Innovation. All rights reserved