#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code
# Categorization of prevalence estimates. This assumes we have prevalence estimates for multiple years, just as we have for incidence
vars_prev <- c("prev_y1", "prev_y2", "prev_y3") # adjust names to align with column names in the prevalence dataset
# Define breaks and labels for prevalence estimates
cut_offs <- c(-Inf, 1, 5, 10, 20, 35, 50, Inf)
cut_lab <- c("<1", "1-5", "5-10", "10-20", "20-35", "35-50", ">50")
# create numerical values for the categories
prev_df <- prev_df %>%
mutate(across(all_of(vars_prev), ~ cut(., breaks = cut_offs,
labels = cut_lab),
.names = "{.col}_cat"))Combined risk categorization
Overview
Given the uncertainties of the three common metrics of transmission and disease burden (incidence, prevalence and all-cause mortality), and the different dimensions of malaria transmission that they represent, countries may choose to combine prevalence, incidence, and or mortality categories to develop a composite risk map. There are several approaches that can be applied to develop a composite metric. A simple two-stage approach is presented here, but should be adapted to country context.
- categorize all incidence estimates based on agreed country cut-offs. The cut-offs should be adapted to range of values from the data
- Categorize prevalence estimates based on agreed country cut-offs. The cut-offs should be adapted to range of values from the data
- Assign numerical values to the various categories
- Combine numerical values of incidence and prevalence to generate a epidemiological risk stratification.
Step-by-Step Instructions
To skip the step-by-step explanation, jump to the full code at the end of this page.
Step 1: Create categories for prevalence data
The categories used here are WHO normative guidance but countries can revise based on their context and data
Step 1.1: Join prevalence data to incidence data
Next we combine the incidence and prevalence datasets
Step 2: Create numerical values for prevalence and incidence categories
Scores are assigned in ascending order to the prevalence and incidence categories based on the number of strata used per metric. For example, for prevalence, scores of 1, 2, 3, 4, 5, 6 for a prevalence of <1, 1-5%, 5-10%, 10-35%, 35-50% or >50% respectively; for incidence, scores of 1 to 7 for <1, 1-50, 50-100, 100-250, 250-500, 500-750 and >750 per 1000 people at risk per year.
Step 3: We sum the scores for incidence and prevalence to generate first risk stratification
The scores are then summed per operational unit, and the sum of the scores is reclassified in quartiles to obtain areas of “Lowest”, “Low”, “Moderate”, “High” and “Very high” morbidity as per incidence and prevalence.
SNT team should agree the most appropriate year to use for the stratification exercise
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code
# First we filter for most current year
year = "appropriate year" # place holder for the appropriate year usually the most current year
inc_prev <- inc_prev %>%
arrange(adm1, adm2, adm3, year) %>%
dplyr::filter(year == year)
# Conducting first risk stratification for the most current year by summing strata values in inc and prev
# let's get the risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)
inc_prev <- inc_prev %>%
mutate(sumcat_crude = crudeinc_cat_num + prev_y3_cat_num,
sumcat_adj1 = adjinc1_cat_num + prev_y3_cat_num,
sumcat_adj2 = adjinc2_cat_num + prev_y3_cat_num,
sumcat_adj3 = adjinc3_cat_num + prev_y3_cat_num)
# Next we will recode the values of the sumcat variables into meaningful thresholds based on the range of values from the sumcat variables. The figures provided here are for illustrative purposes
inc_prev <- inc_prev %>%
mutate( # generating recode values for crude estimates
sumcat_crude_rec = case_match(
sumcat_crude %in% 6:7 ~ 1,
sumcat_crude == 8 ~ 2,
sumcat_crude == 9 ~ 3,
sumcat_crude == 10 ~ 4,
sumcat_crude %in% 11:13 ~ 5,
TRUE ~ NA_real_
),
# generating recode values for adjusted 1 estimates
sumcat_adj1_rec = case_when(
sumcat_adj1 %in% 6:7 ~ 1,
sumcat_adj1 == 8 ~ 2,
sumcat_adj1 == 9 ~ 3,
sumcat_adj1 == 10 ~ 4,
sumcat_adj1 %in% 11:13 ~ 5,
TRUE ~ NA_real_
),
# generating recode values for adjusted 2 estimates
sumcat_adj2_rec = case_when(
sumcat_adj2 %in% 6:7 ~ 1,
sumcat_adj2 == 8 ~ 2,
sumcat_adj2 == 9 ~ 3,
sumcat_adj2 == 10 ~ 4,
sumcat_adj2 %in% 11:13 ~ 5,
TRUE ~ NA_real_
),
# generating recode values for adjusted 3 estimates
sumcat_adj3_rec = case_when(
sumcat_adj3 %in% 6:7 ~ 1,
sumcat_adj3 == 8 ~ 2,
sumcat_adj3 == 9 ~ 3,
sumcat_adj3 == 10 ~ 4,
sumcat_adj3 %in% 11:13 ~ 5,
TRUE ~ NA_real_
)
)
# define labels and apply it to the recoded values
# define labels
labels <- c(
"1" = "Very low",
"2" = "Low",
"3" = "Moderate",
"4" = "High",
"5" = "Very High"
)
# apply to the recoded values
inc_prev <- inc_prev %>%
mutate(
sumcat_crude_rec = factor(sumcat_crude_rec, levels = 1:5, labels = labels),
sumcat_adj1_rec = factor(sumcat_adj1_rec, levels = 1:5, labels = labels),
sumcat_adj2_rec = factor(sumcat_adj2_rec, levels = 1:5, labels = labels),
sumcat_adj3_rec = factor(sumcat_adj3_rec, levels = 1:5, labels = labels)
)Step 3: Create scores for mortality data and add join to incidence-prevalence data
At this stage, we create categories for all-cause mortality in children under 5 and score in ascending order of 1, 2, 3 or 4 for mortality <1, 1-6, 6-9.5, and >9.5 deaths per 1000 live births. These mortality categories are the ones used in the Africa regional map for the malaria vaccine allocation but can be changed depending on context.
# rescore first strata values by combining
# Create categories for mortality data and join it to strata data
u5mr_data <- u5mr_data %>%
mutate(u5mr_cat = case_when(
u5mr < 1 ~ 1,
u5mr %in% 1:6 ~ 2,
u5mr %in% 6:9.5 ~ 3,
TRUE ~ 4
))
# Combine mortality data to inc_prev data
inc_prev_mort <- inc_prev %>%
left_join(u5mr_data, ., by = c("adm1", "adm2", "adm3"))Step 3: Rescore the scores for first strata and add to scores for mortality to generate final risk stratification
Once the first set of strata based on prevalence and incidence scores has been obtained, new scores are assigned to them from 1 (low) to 4 (high) - combine “low” and “lowest”. At this stage, the categories are already scored. The mortality score is then added to the combined prevalence and incidence score obtained in stage 1, and the sum of the scores are reclassified in quartiles to obtain areas of “Low”, “Moderate”, “High” and “Very high” morbidity and mortality
Here we assume the SNT team has agreed to use the adjustment3 stratified map as the final data for the stratification
# rescore first strata values by combining lowest and low categories
inc_prev_mort <- inc_prev_mort %>%
mutate(sumcat_adj3_rec_num = case_when(
sumcat_adj3_rec %in% 1:2 ~ 1,
sumcat_adj3_rec == 3 ~ 2,
sumcat_adj3_rec == 4 ~ 3,
TRUE ~ 4
))
# let's get the final risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)
inc_prev_mort <- inc_prev_mort %>%
mutate(
sumcat2 = sumcat_adj3_rec_num + u5mr_cat)
# Next we will recode the values of the sumcat2 into meaningful thresholds based on the range of values. The figures provided here are for illustrative purposes
inc_prev <- inc_prev %>%
mutate(
sumcat2_rec = case_match(
sumcat2 %in% 2:3 ~ 1,
sumcat2 %in% 4:5 ~ 2,
sumcat2 == 6 ~ 3,
sumcat2 == 7 ~ 4,
sumcat2 == 9 ~ 5,
TRUE ~ NA_real_
))
# define labels and apply it to the recoded values
# define labels
labels <- c(
"1" = "Very low",
"2" = "Low",
"3" = "Moderate",
"4" = "High",
"5" = "Very High"
)
# apply to the recoded values
inc_prev_mort <- inc_prev_mort %>%
mutate(
sumcat2_rec = factor(sumcat2_rec, levels = 1:5, labels = labels)
)Step 4: Mapping first and second risk stratification
Next steps of codes plot each of the risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.
This first set of codes plots each map separately
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code
# first join the final dataset with the adm3 shapefile
strat1_maps <- adm3_sf %>%
left_join(inc_prev, ., by = c("adm1", "adm2", "adm3"))
# Plot for each of the sum cat variables
# Map for Crude incidence stratification
ggplot(strat1_maps) +
geom_sf(aes(fill = sumcat_crude_rec), color = "gray80", size = 0.2) + # inner borders
geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
scale_fill_manual(
name = "Strata",
values = c(
"Very low" = "#c6dbef",
"Low" = "#6baed6",
"Moderate" = "#fdd0a2",
"High" = "#e6550d",
"Very High" = "#de2d26"
),
drop = FALSE
) +
theme_minimal() +
labs(
title = "Strata - Prev+Crudeinc",
subtitle = "Country",
fill = "Strata"
) +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
legend.title = element_text(size = 10, face = "bold"),
legend.text = element_text(size = 9)
)
# Map for Adjusted incidence1 stratification
ggplot(strat1_maps) +
geom_sf(aes(fill = sumcat_adj1_rec), color = "gray80", size = 0.2) + # inner borders
geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
scale_fill_manual(
name = "Strata",
values = c(
"Very low" = "#c6dbef",
"Low" = "#6baed6",
"Moderate" = "#fdd0a2",
"High" = "#e6550d",
"Very High" = "#de2d26"
),
drop = FALSE
) +
theme_minimal() +
labs(
title = "Strata - Prev+Adjinc1",
subtitle = "Country",
fill = "Strata"
) +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
legend.title = element_text(size = 10, face = "bold"),
legend.text = element_text(size = 9)
)
# Map for Adjusted incidence 2 stratification
ggplot(strat1_maps) +
geom_sf(aes(fill = sumcat_adj2_rec), color = "gray80", size = 0.2) + # inner borders
geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
scale_fill_manual(
name = "Strata",
values = c(
"Very low" = "#c6dbef",
"Low" = "#6baed6",
"Moderate" = "#fdd0a2",
"High" = "#e6550d",
"Very High" = "#de2d26"
),
drop = FALSE
) +
theme_minimal() +
labs(
title = "Strata - Prev+Adjinc2",
subtitle = "Country",
fill = "Strata"
) +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
legend.title = element_text(size = 10, face = "bold"),
legend.text = element_text(size = 9)
)
# Map for Adjusted incidence 3 stratification
ggplot(strat1_maps) +
geom_sf(aes(fill = sumcat_adj3_rec), color = "gray80", size = 0.2) + # inner borders
geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
scale_fill_manual(
name = "Strata",
values = c(
"Very low" = "#c6dbef",
"Low" = "#6baed6",
"Moderate" = "#fdd0a2",
"High" = "#e6550d",
"Very High" = "#de2d26"
),
drop = FALSE
) +
theme_minimal() +
labs(
title = "Strata - Prev+Adjinc3",
subtitle = "Country",
fill = "Strata"
) +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
legend.title = element_text(size = 10, face = "bold"),
legend.text = element_text(size = 9)
)Step 4.1: Mapping first risk stratification
The alternative will be to show the maps on a facet grid
Step 5: Mapping second risk stratification
Next steps of codes plot the final risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code
# first join the final dataset with the adm3 shapefile
strat2_maps <- adm3_sf %>%
left_join(inc_prev_mort, ., by = c("adm1", "adm2", "adm3"))
# Plot for each of the sum cat variables
# Map for Crude incidence stratification
ggplot(strat2_maps) +
geom_sf(aes(fill = sumcat2_rec), color = "gray80", size = 0.2) + # inner borders
geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
scale_fill_manual(
name = "Strata",
values = c(
"Very low" = "#c6dbef",
"Low" = "#6baed6",
"Moderate" = "#fdd0a2",
"High" = "#e6550d",
"Very High" = "#de2d26"
),
drop = FALSE
) +
theme_minimal() +
labs(
title = "Strata - Prev+Crudeinc+U5MR",
subtitle = "Country",
fill = "Strata"
) +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
legend.title = element_text(size = 10, face = "bold"),
legend.text = element_text(size = 9)
)Stata
:::
Summary
TBD