Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 4. Stratification
  2. 4.1 Stratification épidémiologique
  3. Combined risk categorization
  • Bibliothèque de code pour l'adaptation infranationale
    Version française
  • 1. Pour commencer
    • 1.1 À propos et comment nous contacter
    • 1.2 Pour tous les utilisateurs
    • 1.3 Pour l’équipe SNT
    • 1.4 Pour les analystes
    • 1.5 Acronymes et bibliothèque de ressources
    • 1.6 Produire des résultats de haute qualité
  • 2. Assemblage et gestion des données
    • 2.1 Utilisation des shapefiles
      • Aperçu des données spatiales
      • Utilisation et visualisation de base des shapefiles
      • Gestion et personnalisation des shapefiles
      • Fusion des shapefiles avec des données tabulaires
    • 2.2 Formations sanitaires
      • Correspondance approximative des noms entre jeux de données
      • Coordonnées des établissements de santé et données ponctuelles
    • 2.3 Données de cas de routine (DHIS2)
      • Détermination du statut actif et inactif
      • Data extraction from DHIS2
      • Prétraitement des données DHIS2
      • Méthodes de détection des données manquantes
      • Outlier correction
      • Considérations contextuelles
      • Taux de notification des établissements de santé
      • Quality control/checks
      • Outlier detection methods
      • Imputation of missing data
      • Final database
    • 2.4 Données du stock
      • lmis
    • 2.5 Données démographiques
      • Données démographiques nationales
      • Raster de population WorldPop
    • 2.6 Enquêtes nationales auprès des ménages
      • DHS Data Overview and Preparation
      • All-Cause Child Mortality
      • Wealth quintiles analysis
      • Extraction of ITN ownership, access, and usage
      • Extracion of prevalence data
      • Calculation of treatment-seeking data
    • 2.7 Données entomologiques
      • Données entomologiques
    • 2.8 Données climatiques et environnementales
      • Extraction de données climatiques et environnementales à partir de données raster
    • 2.9 Données modélisées
      • Generating spatial modeled estimates
      • Travailler avec les estimations modélisées géospatiales
      • Modeled Estimates of Entomological Indicators
      • Mortality estimates from IHME
    • 2.10 Données financières
  • 3. Analyse de la situation
    • 3.1 Revue des interventions historiques
      • Prise en charge des cas
      • Interventions de routine
      • Les campagnes de masse de moustiquaires
      • Les campagnes de chimioprévention
      • Autres interventions lutte antivectorielle
    • 3.2 Analyse des tendances
    • 3.3 Analyse des facteurs de risque
    • 3.4 Évaluation de l’impact des interventions
    • 3.5 Analyse des coûts
  • 4. Stratification
    • 4.1 Stratification épidémiologique
      • Aperçu de l’incidence et incidence brute
      • Ajustement de l’incidence 1 : noncomplétude du dépistage
      • Ajustement de l’incidence 2 : noncomplétude du rapportage
      • Ajustement de l’incidence 2 : recherche des soins
      • Incidence stratification
      • Stratification par prévalence et mortalité
      • Risk categorization
      • Combined risk categorization
      • Risk categorization REMOVE?
    • 4.2 Accès aux soins
    • 4.3 Saisonnalité
      • Définir les zones saisonnières
      • Durées de saisonnalité
    • 4.4 Microstratification urbaine
  • 5. Ciblage et priorisation des interventions
    • 5.1 Ciblage des interventions
    • 5.2 Priorisation
    • 5.3 Optimisation dans la limite des ressources

On this page

  • Overview
  • Step-by-Step Instructions
    • Step 1: Create categories for prevalence data
    • Step 1.1: Join prevalence data to incidence data
    • Step 2: Create numerical values for prevalence and incidence categories
    • Step 3: We sum the scores for incidence and prevalence to generate first risk stratification
    • Step 3: Create scores for mortality data and add join to incidence-prevalence data
    • Step 3: Rescore the scores for first strata and add to scores for mortality to generate final risk stratification
    • Step 4: Mapping first and second risk stratification
    • Step 4.1: Mapping first risk stratification
    • Step 5: Mapping second risk stratification
  • Stata
  • Summary
  • Full code
  1. 4. Stratification
  2. 4.1 Stratification épidémiologique
  3. Combined risk categorization

Combined risk categorization

Overview

Given the uncertainties of the three common metrics of transmission and disease burden (incidence, prevalence and all-cause mortality), and the different dimensions of malaria transmission that they represent, countries may choose to combine prevalence, incidence, and or mortality categories to develop a composite risk map. There are several approaches that can be applied to develop a composite metric. A simple two-stage approach is presented here, but should be adapted to country context.

NoteObjectives
  1. categorize all incidence estimates based on agreed country cut-offs. The cut-offs should be adapted to range of values from the data
  2. Categorize prevalence estimates based on agreed country cut-offs. The cut-offs should be adapted to range of values from the data
  3. Assign numerical values to the various categories
  4. Combine numerical values of incidence and prevalence to generate a epidemiological risk stratification.

Step-by-Step Instructions

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Create categories for prevalence data

The categories used here are WHO normative guidance but countries can revise based on their context and data

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# Categorization of prevalence estimates. This assumes we have prevalence estimates for multiple years, just as we have for incidence

vars_prev <- c("prev_y1", "prev_y2", "prev_y3") # adjust names to align with column names in the prevalence dataset

# Define breaks and labels for prevalence estimates
cut_offs <- c(-Inf, 1, 5, 10, 20, 35, 50, Inf)
cut_lab <- c("<1", "1-5", "5-10", "10-20", "20-35", "35-50", ">50")

# create numerical values for the categories
prev_df <- prev_df %>%
  mutate(across(all_of(vars_prev), ~ cut(., breaks = cut_offs,
                                                   labels = cut_lab),
                 .names = "{.col}_cat"))

Step 1.1: Join prevalence data to incidence data

Next we combine the incidence and prevalence datasets

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


inc_prev <- ann_inc_data %>%
      left_join(prev_df, ., by = c("adm1", "adm2", "adm3", "year"))

Step 2: Create numerical values for prevalence and incidence categories

Scores are assigned in ascending order to the prevalence and incidence categories based on the number of strata used per metric. For example, for prevalence, scores of 1, 2, 3, 4, 5, 6 for a prevalence of <1, 1-5%, 5-10%, 10-35%, 35-50% or >50% respectively; for incidence, scores of 1 to 7 for <1, 1-50, 50-100, 100-250, 250-500, 500-750 and >750 per 1000 people at risk per year.

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


inc_prev <- inc_prev %>%
      mutate(across(ends_with("_cat"), ~ as.numeric(factor(.)),
                    .names = "{.col}_num"))

Step 3: We sum the scores for incidence and prevalence to generate first risk stratification

The scores are then summed per operational unit, and the sum of the scores is reclassified in quartiles to obtain areas of “Lowest”, “Low”, “Moderate”, “High” and “Very high” morbidity as per incidence and prevalence.

SNT team should agree the most appropriate year to use for the stratification exercise

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code


# First we filter for most current year

year = "appropriate year" # place holder for the appropriate year usually the most current year

inc_prev <- inc_prev %>%
      arrange(adm1, adm2, adm3, year) %>%
  dplyr::filter(year == year)

# Conducting first risk stratification for the most current year by summing strata values in inc and prev
# let's get the risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)

inc_prev <- inc_prev %>%
  mutate(sumcat_crude = crudeinc_cat_num + prev_y3_cat_num,
         sumcat_adj1 = adjinc1_cat_num + prev_y3_cat_num,
         sumcat_adj2 = adjinc2_cat_num + prev_y3_cat_num,
         sumcat_adj3 = adjinc3_cat_num + prev_y3_cat_num)

# Next we will recode the values of the sumcat variables into meaningful thresholds based on the range of values from the sumcat variables. The figures provided here are for illustrative purposes

inc_prev <- inc_prev %>%
  mutate( # generating recode values for crude estimates
    sumcat_crude_rec = case_match(
      sumcat_crude %in% 6:7 ~ 1,
      sumcat_crude == 8 ~ 2,
      sumcat_crude == 9 ~ 3,
      sumcat_crude == 10 ~ 4,
      sumcat_crude %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),
    # generating recode values for adjusted 1 estimates
    sumcat_adj1_rec = case_when(
      sumcat_adj1 %in% 6:7 ~ 1,
      sumcat_adj1 == 8 ~ 2,
      sumcat_adj1 == 9 ~ 3,
      sumcat_adj1 == 10 ~ 4,
      sumcat_adj1 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),
    # generating recode values for adjusted 2 estimates
    sumcat_adj2_rec = case_when(
      sumcat_adj2 %in% 6:7 ~ 1,
      sumcat_adj2 == 8 ~ 2,
      sumcat_adj2 == 9 ~ 3,
      sumcat_adj2 == 10 ~ 4,
      sumcat_adj2 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    ),

    # generating recode values for adjusted 3 estimates
    sumcat_adj3_rec = case_when(
      sumcat_adj3 %in% 6:7 ~ 1,
      sumcat_adj3 == 8 ~ 2,
      sumcat_adj3 == 9 ~ 3,
      sumcat_adj3 == 10 ~ 4,
      sumcat_adj3 %in% 11:13 ~ 5,
      TRUE ~ NA_real_
    )
  )

# define labels and apply it to the recoded values
# define labels
labels <- c(
  "1" = "Very low",
  "2" = "Low",
  "3" = "Moderate",
  "4" = "High",
  "5" = "Very High"
)

# apply to the recoded values
inc_prev <- inc_prev %>%
  mutate(
    sumcat_crude_rec = factor(sumcat_crude_rec, levels = 1:5, labels = labels),
    sumcat_adj1_rec  = factor(sumcat_adj1_rec, levels = 1:5, labels = labels),
    sumcat_adj2_rec  = factor(sumcat_adj2_rec, levels = 1:5, labels = labels),
    sumcat_adj3_rec  = factor(sumcat_adj3_rec, levels = 1:5, labels = labels)
  )

Step 3: Create scores for mortality data and add join to incidence-prevalence data

At this stage, we create categories for all-cause mortality in children under 5 and score in ascending order of 1, 2, 3 or 4 for mortality <1, 1-6, 6-9.5, and >9.5 deaths per 1000 live births. These mortality categories are the ones used in the Africa regional map for the malaria vaccine allocation but can be changed depending on context.

  • R
  • Python
# rescore first strata values by combining


# Create categories for mortality data and join it to strata data

u5mr_data <- u5mr_data %>%
  mutate(u5mr_cat = case_when(
    u5mr < 1 ~ 1,
    u5mr %in% 1:6 ~ 2,
    u5mr %in% 6:9.5 ~ 3,
    TRUE ~ 4
  ))

# Combine mortality data to inc_prev data

inc_prev_mort <- inc_prev %>%
  left_join(u5mr_data, ., by = c("adm1", "adm2", "adm3"))

Step 3: Rescore the scores for first strata and add to scores for mortality to generate final risk stratification

Once the first set of strata based on prevalence and incidence scores has been obtained, new scores are assigned to them from 1 (low) to 4 (high) - combine “low” and “lowest”. At this stage, the categories are already scored. The mortality score is then added to the combined prevalence and incidence score obtained in stage 1, and the sum of the scores are reclassified in quartiles to obtain areas of “Low”, “Moderate”, “High” and “Very high” morbidity and mortality

Here we assume the SNT team has agreed to use the adjustment3 stratified map as the final data for the stratification

  • R
  • Python
# rescore first strata values by combining lowest and low categories

inc_prev_mort <- inc_prev_mort %>%
  mutate(sumcat_adj3_rec_num = case_when(
    sumcat_adj3_rec %in% 1:2 ~ 1,
    sumcat_adj3_rec == 3 ~ 2,
    sumcat_adj3_rec == 4 ~ 3,
    TRUE ~ 4
  ))


# let's get the final risk stratification combine each of the incidence estimates to the prevalence (assuming prev_y3 is the most current prevalence estimats)

inc_prev_mort <- inc_prev_mort %>%
  mutate(
         sumcat2 = sumcat_adj3_rec_num + u5mr_cat)

# Next we will recode the values of the sumcat2 into meaningful thresholds based on the range of values. The figures provided here are for illustrative purposes

inc_prev <- inc_prev %>%
  mutate(
    sumcat2_rec = case_match(
      sumcat2 %in% 2:3 ~ 1,
      sumcat2 %in% 4:5 ~ 2,
      sumcat2 == 6 ~ 3,
      sumcat2 == 7 ~ 4,
      sumcat2 == 9 ~ 5,
      TRUE ~ NA_real_
    ))

# define labels and apply it to the recoded values
# define labels
labels <- c(
  "1" = "Very low",
  "2" = "Low",
  "3" = "Moderate",
  "4" = "High",
  "5" = "Very High"
)

# apply to the recoded values
inc_prev_mort <- inc_prev_mort %>%
  mutate(
    sumcat2_rec = factor(sumcat2_rec, levels = 1:5, labels = labels)
  )

Step 4: Mapping first and second risk stratification

Next steps of codes plot each of the risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.

This first set of codes plots each map separately

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# first join the final dataset with the adm3 shapefile

strat1_maps <- adm3_sf %>%
  left_join(inc_prev, ., by = c("adm1", "adm2", "adm3"))

# Plot for each of the sum cat variables
# Map for Crude incidence stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_crude_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Crudeinc",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

# Map for Adjusted incidence1 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj1_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc1",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )


# Map for Adjusted incidence 2 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj2_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc2",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )


# Map for Adjusted incidence 3 stratification

ggplot(strat1_maps) +
  geom_sf(aes(fill = sumcat_adj3_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Adjinc3",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

Step 4.1: Mapping first risk stratification

The alternative will be to show the maps on a facet grid

  • Python

Step 5: Mapping second risk stratification

Next steps of codes plot the final risk strata on a map at the appropriate adm level. in this code we plot it at admin 3 level.

  • R
  • Python
#| eval: false
#| message: false
#| warning: false
#| code-fold: false
#| code-summary: Show the code

# first join the final dataset with the adm3 shapefile

strat2_maps <- adm3_sf %>%
  left_join(inc_prev_mort, ., by = c("adm1", "adm2", "adm3"))

# Plot for each of the sum cat variables
# Map for Crude incidence stratification

ggplot(strat2_maps) +
  geom_sf(aes(fill = sumcat2_rec), color = "gray80", size = 0.2) +  # inner borders
  geom_sf(data = adm1_sf, fill = NA, color = "black", size = 0.3) + # adm1 borders
  scale_fill_manual(
    name = "Strata",
    values = c(
      "Very low" = "#c6dbef",
      "Low" = "#6baed6",
      "Moderate" = "#fdd0a2",
      "High" = "#e6550d",
      "Very High" = "#de2d26"
    ),
    drop = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Strata - Prev+Crudeinc+U5MR",
    subtitle = "Country",
    fill = "Strata"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

Stata

Summary

TBD

Full code

  • R
  • Python
Show full code
#===============================================================================
# End of Script
#===============================================================================
 

©2026 Applied Health Analytics for Delivery and Innovation. All rights reserved