Combined risk categorization

Overview

Given the uncertainties of the three common metrics of transmission and disease burden (incidence, prevalence and all-cause mortality), and the different dimensions of malaria transmission that they represent, countries may choose to combine prevalence, incidence, and or mortality categories to develop a composite risk map. There are several approaches that can be applied to develop a composite metric. A simple two-stage approach is presented here, but should be adapted to country context.

Objectives

categorize all incidence estimates based on agreed country cut-offs. The cut-offs should be adapted to range of values from the data
Categorize prevalence estimates based on agreed country cut-offs. The cut-offs should be adapted to range of values from the data
Assign numerical values to the various categories
Combine numerical values of incidence and prevalence to generate a epidemiological risk stratification.

Step-by-Step Instructions

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Create categories for prevalence data

The categories used here are WHO normative guidance but countries can revise based on their context and data

R
Python

#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# Categorization of prevalence estimates. This assumes we have prevalence estimates for multiple years, just as we have for incidence

vars_prev <- c("prev_y1", "prev_y2", "prev_y3") # adjust names to align with column names in the prevalence dataset

# Define breaks and labels for prevalence estimates
cut_offs <- c(-Inf, 1, 5, 10, 20, 35, 50, Inf)
cut_lab <- c("<1", "1-5", "5-10", "10-20", "20-35", "35-50", ">50")

# create numerical values for the categories
prev_df <- prev_df %>%
  mutate(across(all_of(vars_prev), ~ cut(., breaks = cut_offs,
                                                   labels = cut_lab),
                 .names = "{.col}_cat"))

Step 1.1: Join prevalence data to incidence data

Next we combine the incidence and prevalence datasets

R
Python

#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


inc_prev <- ann_inc_data %>%
      left_join(prev_df, ., by = c("adm1", "adm2", "adm3", "year"))

Step 2: Create numerical values for prevalence and incidence categories

Scores are assigned in ascending order to the prevalence and incidence categories based on the number of strata used per metric. For example, for prevalence, scores of 1, 2, 3, 4, 5, 6 for a prevalence of <1, 1-5%, 5-10%, 10-35%, 35-50% or >50% respectively; for incidence, scores of 1 to 7 for <1, 1-50, 50-100, 100-250, 250-500, 500-750 and >750 per 1000 people at risk per year.

R
Python

#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


inc_prev <- inc_prev %>%
      mutate(across(ends_with("_cat"), ~ as.numeric(factor(.)),
                    .names = "{.col}_num"))

Step 3: We sum the scores for incidence and prevalence to generate first risk stratification

The scores are then summed per operational unit, and the sum of the scores is reclassified in quartiles to obtain areas of “Lowest”, “Low”, “Moderate”, “High” and “Very high” morbidity as per incidence and prevalence.

SNT team should agree the most appropriate year to use for the stratification exercise

R
Python

#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


# First we filter for most current year

year = "appropriate year" # place holder for the appropriate year usually the most current year

inc_prev <- inc_prev %>%
      arrange(adm1, adm2, adm3, year) %>%
  dplyr::filter(year == year)

# Conducting first risk stratification for the most current year by summing strata values in inc and prev
# let's get the risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)

inc_prev <- inc_prev %>%
  mutate(sumcat_crude = crudeinc_cat_num + prev_y3_cat_num,
         sumcat_adj1 = adjinc1_cat_num + prev_y3_cat_num,
         sumcat_adj2 = adjinc2_cat_num + prev_y3_cat_num,
         sumcat_adj3 = adjinc3_cat_num + prev_y3_cat_num)

# Next we will recode the values of the sumcat variables into meaningful thresholds based on the range of values from the sumcat variables. The figures provided here are for illustrative purposes

inc_prev <- inc_prev %>%
  mutate( # generating recode values for crude estimates
    sumcat_crude_rec = case_match(
      sumcat_crude %in% 6:7 ~ 1,
      sumcat_crude == 8 ~ 2,
      sumcat_crude == 9 ~ 3,
      sumcat_crude == 10 ~ 4,
      sumcat_crude %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),
    # generating recode values for adjusted 1 estimates
    sumcat_adj1_rec = case_when(
      sumcat_adj1 %in% 6:7 ~ 1,
      sumcat_adj1 == 8 ~ 2,
      sumcat_adj1 == 9 ~ 3,
      sumcat_adj1 == 10 ~ 4,
      sumcat_adj1 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),
    # generating recode values for adjusted 2 estimates
    sumcat_adj2_rec = case_when(
      sumcat_adj2 %in% 6:7 ~ 1,
      sumcat_adj2 == 8 ~ 2,
      sumcat_adj2 == 9 ~ 3,
      sumcat_adj2 == 10 ~ 4,
      sumcat_adj2 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),

    # generating recode values for adjusted 3 estimates
    sumcat_adj3_rec = case_when(
      sumcat_adj3 %in% 6:7 ~ 1,
      sumcat_adj3 == 8 ~ 2,
      sumcat_adj3 == 9 ~ 3,
      sumcat_adj3 == 10 ~ 4,
      sumcat_adj3 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    )
  )

# define labels and apply it to the recoded values
# define labels
labels <- c(
  "1" = "Very low",
  "2" = "Low",
  "3" = "Moderate",
  "4" = "High",
  "5" = "Very High"
)

# apply to the recoded values
inc_prev <- inc_prev %>%
  mutate(
    sumcat_crude_rec = factor(sumcat_crude_rec, levels = 1:5, labels = labels),
    sumcat_adj1_rec  = factor(sumcat_adj1_rec, levels = 1:5, labels = labels),
    sumcat_adj2_rec  = factor(sumcat_adj2_rec, levels = 1:5, labels = labels),
    sumcat_adj3_rec  = factor(sumcat_adj3_rec, levels = 1:5, labels = labels)
  )

Step 3: Create scores for mortality data and add join to incidence-prevalence data

At this stage, we create categories for all-cause mortality in children under 5 and score in ascending order of 1, 2, 3 or 4 for mortality <1, 1-6, 6-9.5, and >9.5 deaths per 1000 live births. These mortality categories are the ones used in the Africa regional map for the malaria vaccine allocation but can be changed depending on context.

R
Python

# rescore first strata values by combining


# Create categories for mortality data and join it to strata data

u5mr_data <- u5mr_data %>%
  mutate(u5mr_cat = case_when(
    u5mr < 1 ~ 1,
    u5mr %in% 1:6 ~ 2,
    u5mr %in% 6:9.5 ~ 3,
    TRUE ~ 4
  ))

# Combine mortality data to inc_prev data

inc_prev_mort <- inc_prev %>%
  left_join(u5mr_data, ., by = c("adm1", "adm2", "adm3"))

Step 3: Rescore the scores for first strata and add to scores for mortality to generate final risk stratification

Once the first set of strata based on prevalence and incidence scores has been obtained, new scores are assigned to them from 1 (low) to 4 (high) - combine “low” and “lowest”. At this stage, the categories are already scored. The mortality score is then added to the combined prevalence and incidence score obtained in stage 1, and the sum of the scores are reclassified in quartiles to obtain areas of “Low”, “Moderate”, “High” and “Very high” morbidity and mortality

Here we assume the SNT team has agreed to use the adjustment3 stratified map as the final data for the stratification

R
Python

# rescore first strata values by combining lowest and low categories

inc_prev_mort <- inc_prev_mort %>%
  mutate(sumcat_adj3_rec_num = case_when(
    sumcat_adj3_rec %in% 1:2 ~ 1,
    sumcat_adj3_rec == 3 ~ 2,
    sumcat_adj3_rec == 4 ~ 3,
    TRUE ~ 4
  ))


# let's get the final risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)

inc_prev_mort <- inc_prev_mort %>%
  mutate(
         sumcat2 = sumcat_adj3_rec_num + u5mr_cat)

# Next we will recode the values of the sumcat2 into meaningful thresholds based on the range of values. The figures provided here are for illustrative purposes

inc_prev <- inc_prev %>%
  mutate(
    sumcat2_rec = case_match(
      sumcat2 %in% 2:3 ~ 1,
      sumcat2 %in% 4:5 ~ 2,
      sumcat2 == 6 ~ 3,
      sumcat2 == 7 ~ 4,
      sumcat2 == 9 ~ 5,
      TRUE ~ NA_real_
    ))

# define labels and apply it to the recoded values
# define labels
labels <- c(
  "1" = "Very low",
  "2" = "Low",
  "3" = "Moderate",
  "4" = "High",
  "5" = "Very High"
)

# apply to the recoded values
inc_prev_mort <- inc_prev_mort %>%
  mutate(
    sumcat2_rec = factor(sumcat2_rec, levels = 1:5, labels = labels)
  )

Step 4: Mapping first and second risk stratification

Next steps of codes plot each of the risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.

This first set of codes plots each map separately

R
Python

#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# first join the final dataset with the adm3 shapefile

strat1_maps <- adm3_sf %>%
  left_join(inc_prev, ., by = c("adm1", "adm2", "adm3"))

# Plot for each of the sum cat variables
# Map for Crude incidence stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_crude_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Crudeinc",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

# Map for Adjusted incidence1 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj1_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc1",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )


# Map for Adjusted incidence 2 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj2_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc2",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )


# Map for Adjusted incidence 3 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj3_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc3",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

Step 4.1: Mapping first risk stratification

The alternative will be to show the maps on a facet grid

Python

Step 5: Mapping second risk stratification

Next steps of codes plot the final risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.

R
Python

#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# first join the final dataset with the adm3 shapefile

strat2_maps <- adm3_sf %>%
  left_join(inc_prev_mort, ., by = c("adm1", "adm2", "adm3"))

# Plot for each of the sum cat variables
# Map for Crude incidence stratification

ggplot(strat2_maps) +
  geom_sf(aes(fill = sumcat2_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Crudeinc+U5MR",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

Stata

:::

Summary

TBD

Show full code

#===============================================================================
# End of Script
#===============================================================================