Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 4. Stratification
  2. 4.1 Stratification épidémiologique
  3. Incidence stratification
  • Bibliothèque de code pour l'adaptation infranationale
    Version française
  • 1. Pour commencer
    • 1.1 À propos et comment nous contacter
    • 1.2 Pour tous les utilisateurs
    • 1.3 Pour l’équipe SNT
    • 1.4 Pour les analystes
    • 1.5 Acronymes et bibliothèque de ressources
    • 1.6 Produire des résultats de haute qualité
  • 2. Assemblage et gestion des données
    • 2.1 Utilisation des shapefiles
      • Aperçu des données spatiales
      • Utilisation et visualisation de base des shapefiles
      • Gestion et personnalisation des shapefiles
      • Fusion des shapefiles avec des données tabulaires
    • 2.2 Formations sanitaires
      • Correspondance approximative des noms entre jeux de données
      • Coordonnées des établissements de santé et données ponctuelles
    • 2.3 Données de cas de routine (DHIS2)
      • Détermination du statut actif et inactif
      • Data extraction from DHIS2
      • Prétraitement des données DHIS2
      • Méthodes de détection des données manquantes
      • Outlier correction
      • Considérations contextuelles
      • Taux de notification des établissements de santé
      • Quality control/checks
      • Outlier detection methods
      • Imputation of missing data
      • Final database
    • 2.4 Données du stock
      • lmis
    • 2.5 Données démographiques
      • Données démographiques nationales
      • Raster de population WorldPop
    • 2.6 Enquêtes nationales auprès des ménages
      • DHS Data Overview and Preparation
      • All-Cause Child Mortality
      • Wealth quintiles analysis
      • Extraction of ITN ownership, access, and usage
      • Extracion of prevalence data
      • Calculation of treatment-seeking data
    • 2.7 Données entomologiques
      • Données entomologiques
    • 2.8 Données climatiques et environnementales
      • Extraction de données climatiques et environnementales à partir de données raster
    • 2.9 Données modélisées
      • Generating spatial modeled estimates
      • Travailler avec les estimations modélisées géospatiales
      • Modeled Estimates of Entomological Indicators
      • Mortality estimates from IHME
    • 2.10 Données financières
  • 3. Analyse de la situation
    • 3.1 Revue des interventions historiques
      • Prise en charge des cas
      • Interventions de routine
      • Les campagnes de masse de moustiquaires
      • Les campagnes de chimioprévention
      • Autres interventions lutte antivectorielle
    • 3.2 Analyse des tendances
    • 3.3 Analyse des facteurs de risque
    • 3.4 Évaluation de l’impact des interventions
    • 3.5 Analyse des coûts
  • 4. Stratification
    • 4.1 Stratification épidémiologique
      • Aperçu de l’incidence et incidence brute
      • Ajustement de l’incidence 1 : noncomplétude du dépistage
      • Ajustement de l’incidence 2 : noncomplétude du rapportage
      • Ajustement de l’incidence 2 : recherche des soins
      • Incidence stratification
      • Stratification par prévalence et mortalité
      • Risk categorization
      • Combined risk categorization
      • Risk categorization REMOVE?
    • 4.2 Accès aux soins
    • 4.3 Saisonnalité
      • Définir les zones saisonnières
      • Durées de saisonnalité
    • 4.4 Microstratification urbaine
  • 5. Ciblage et priorisation des interventions
    • 5.1 Ciblage des interventions
    • 5.2 Priorisation
    • 5.3 Optimisation dans la limite des ressources

On this page

  • Overview
  • Step-by-Step Instructions
    • Step 1: Stratify crude incidence
    • Step 1.1: Stratify adjusted incidence1
    • Step 1.2: Stratify adjusted incidence 2
    • Step 1.3: Stratify adjusted incidence3
    • Step 2: Mapping all incidence estimates
  1. 4. Stratification
  2. 4.1 Stratification épidémiologique
  3. Incidence stratification

Incidence stratification

Overview

Stratification is defined as classification of geographical areas or localities according to epidemiological, ecological, social and economic determinants for the purpose of guiding malaria interventions. It can include risk stratification (i.e. classification of geographical areas or localities according to factors that determine receptivity and vulnerability to malaria transmission) and/or interventions stratification based on eligibility and other criteria (e.g. endemicity criteria). It is important to distinguish between simply mapping the spatial distribution of an indicator and stratifying it based on the specific question at hand. Stratification involves transforming an indicator into meaningful categories that align with decision-making needs in malaria response for each setting. These categories must be strategically relevant for effective sub-national planning. For example, converting a malaria risk map into categories based on WHO’s transmission continuum to guide intervention strategies would be considered malaria risk stratification.

To provide the relevant information needed for programmatic use the stratification analysis should be done at the subnational unit of operation or lower levels. In settings with high transmission, the NMP usually stratifies subnational areas such as districts, health zones, provinces or regions. As countries progress towards elimination, finer scale mapping is required, and stratification should be more specific, ideally at the level of localities or health facility catchment areas, usually using absolute case data.

NoteObjectives
  • TBD

Step-by-Step Instructions

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Stratify crude incidence

For stratification purposes, it is recommended that the calculated incidence is aggregated by year at the operational administrative level.

  • R
  • Python
# create string values for the categories
ann_inc_data <- ann_inc_data %>%
  arrange(adm1, adm2, adm2, year) %>%
  mutate(crudeinc_cat = case_when(crudeinc <5 ~ "<5",
                                  crudeinc <= 50 ~ "5-50",
                                  crudeinc <=100 ~ "50-100",
                                  crudeinc <=250 ~ "100-250",
                                  crudeinc <=450 ~ "250-450",
                                  crudeinc <=750 ~ "450-750",
                                   TRUE ~ ">750"))

Step 1.1: Stratify adjusted incidence1

Categorize adjusted incidence1 based on agreed cut-offs

  • R
  • Python
# create numerical values for the categories
ann_inc_data <- ann_inc_data %>%
  mutate(adjinc1_cat = case_when(adjinc1 <5 ~ "<5",
                                  adjinc1 <= 50 ~ "5-50",
                                  adjinc1 <=100 ~ "50-100",
                                  adjinc1 <=250 ~ "100-250",
                                  adjinc1 <=450 ~ "250-450",
                                  adjinc1 <=750 ~ "450-750",
                                   TRUE ~ ">750"))

Step 1.2: Stratify adjusted incidence 2

Categorize adjusted incidence1 based on agreed cut-offs

  • R
  • Python
# create numerical values for the categories
ann_inc_data <- ann_inc_data %>%
  mutate(adjinc2_cat = case_when(adjinc2 <5 ~ "<5",
                                  adjinc2 <= 50 ~ "5-50",
                                  adjinc2 <=100 ~ "50-100",
                                  adjinc2 <=250 ~ "100-250",
                                  adjinc2 <=450 ~ "250-450",
                                  adjinc2 <=750 ~ "450-750",
                                   TRUE ~ ">750"))

Step 1.3: Stratify adjusted incidence3

Categorize adjusted incidence1 based on agreed cut-offs

  • R
  • Python
# create numerical values for the categories
ann_inc_data <- ann_inc_data %>%
  mutate(adjinc3_cat = case_when(adjinc3 <5 ~ "<5",
                                  adjinc3 <= 50 ~ "5-50",
                                  adjinc3 <=100 ~ "50-100",
                                  adjinc3 <=250 ~ "100-250",
                                  adjinc3 <=450 ~ "250-450",
                                  adjinc3 <=750 ~ "450-750",
                                   TRUE ~ ">750"))

Step 2: Mapping all incidence estimates

Once all incidence estimates are calculated, district-level trends and maps of the crude and adjusted incidence estimates need to be visually examined by NMPs. Discussions should be held to weigh the benefits and limitations of each adjustment until a consensus is reached on the best incidence metric to be used for intervention targeting.

Countries are highly encouraged to review the standard approach provided here and adapt the equations and sources of data as they see fit for their context.

  • Step 2.1: Join incidence data with shapefile
  • Step 2.2: Set color scheme to be used for all plots
  • Step 2.3: Set year variables
  • Step 4: Mapping for Crude incidence
  • Step 4.1: Mapping for adjusted incidence1
  • Step 4.2: Mapping for adjusted incidence2
  • Step 4.3: Mapping for adjusted incidence3

Join incidence dataset with shapefile

  • R
  • Python
inc_maps <- shp_adm3 %>%  # assuming this is the shapefile at adm3 in sf format
  left_join(ann_inc_data, by = c("adm1", "adm2", "adm3")) # this joins the data to the shapefile

We use the same set of colors for the different categories so we can better compare the changes after each adjustment and each year and make decisions

  • R
  • Python
colors <- c("white", "lightblue", "lightsteelblue", "royalblue", "red", "darkred", "maroon")

Since we have multiple years, we automate the plotting for the different years

  • R
  • Python
years <- unique(inc_maps$year)

Plot maps for crude incidence for each year and save

  • R
  • Python
for (yr in years) {

  year <- inc_maps %>%
    filter(year == yr)

  crude <- ggplot(year) +
    geom_sf(aes(fill = factor(crudeinc_cat)), color = "gray") +
    geom_sf(data = adm1_coords, fill = NA, color = "black", size = 0.3) + # adding adm1 layer
    scale_fill_manual(values = colors,
                      name = "Cases per 1000",
                      labels = c("<5", "5-50", "50-100", "100-250", "250-450", "450-750", ">750"),
                      drop = FALSE) +
    labs(title = paste("Crude Incidence", yr),
         subtitle = "Country") +
    theme_minimal() +
    theme(legend.position = "right",
          plot.title = element_text(size = 14),
          plot.subtitle = element_text(size = 12))

  # print plot
  print(inc1)

  # Save the plot
  ggsave(filename = paste0("gph/crude_", yr, ".png"), plot = crude, width = 8, height = 6, dpi = 300)
}

Plot maps for adjusted incidence1 for each year and save

  • R
  • Python
for (yr in years) {

  year <- inc_maps %>%
    filter(year == yr)

  inc1 <- ggplot(year) +
    geom_sf(aes(fill = factor(adjinc1_cat)), color = "gray") +
    geom_sf(data = adm1_coords, fill = NA, color = "black", size = 0.3) + # adding adm1 layer
    scale_fill_manual(values = colors,
                      name = "Cases per 1000",
                      labels = c("<5", "5-50", "50-100", "100-250", "250-450", "450-750", ">750"),
                      drop = FALSE) +
    labs(title = paste("Adjusted Incidence1", yr),
         subtitle = "Country") +
    theme_minimal() +
    theme(legend.position = "right",
          plot.title = element_text(size = 14),
          plot.subtitle = element_text(size = 12))

  # print plot
  print(inc1)

  # Save the plot
  ggsave(filename = paste0("gph/inc1_", yr, ".png"), plot = inc1, width = 8, height = 6, dpi = 300)
}

Plot maps for adjusted incidence2 for each year and save

  • R
  • Python
for (yr in years) {

  year <- inc_maps %>%
    filter(year == yr)

  inc2 <- ggplot(year) +
    geom_sf(aes(fill = factor(adjinc2_cat)), color = "gray") +
    geom_sf(data = adm1_coords, fill = NA, color = "black", size = 0.3)+ # adding adm1 layer
    scale_fill_manual(values = colors,
                      name = "Cases per 1000",
                      labels = c("<5", "5-50", "50-100", "100-250", "250-450", "450-750", ">750"),
                      drop = FALSE) +
    labs(title = paste("Adjusted Incidence2", yr),
         subtitle = "Country") +
    theme_minimal() +
    theme(legend.position = "right",
          plot.title = element_text(size = 14),
          plot.subtitle = element_text(size = 12))

  # print plot
print(inc2)

# Save the plot
   ggsave(filename = paste0("gph/inc2_", yr, ".png"), plot = inc2, width = 8, height = 6, dpi = 300)

}

Plot maps for adjusted incidence3 for each year and save

  • R
  • Python
for (yr in years) {

  year <- inc_maps %>%
    filter(year == yr)

  inc3 <- ggplot(year) +
    geom_sf(aes(fill = factor(adjinc3_cat)), color = "gray") +
    geom_sf(data = adm1_coords, fill = NA, color = "black", size = 0.3)+ # adding adm1 layer
    scale_fill_manual(values = colors,
                      name = "Cases per 1000",
                      labels = c("<5", "5-50", "50-100", "100-250", "250-450", "450-750", ">750"),
                      drop = FALSE) +
    labs(title = paste("Adjusted Incidence3", yr),
         subtitle = "Country") +
    theme_minimal() +
    theme(legend.position = "right",
          plot.title = element_text(size = 14),
          plot.subtitle = element_text(size = 12))

  # print plot
   print(inc3)

  # Save the plot
   ggsave(filename = paste0("gph/inc3_", yr, ".png"), plot = inc3, width = 8, height = 6, dpi = 300)

}

Summary

The process is iterative, requiring periodic refinement based on updated data and the observed impact of interventions. This ensures stratification remains a dynamic tool for guiding operational decisions and resource allocation, ultimately improving the effectiveness of malaria control programs at the subnational level.

Full code

  • R
  • Python
Show full code
#===============================================================================
# End of Script
#===============================================================================
 

©2026 Applied Health Analytics for Delivery and Innovation. All rights reserved