Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 2. Assemblage et gestion des données
  2. 2.6 Enquêtes nationales auprès des ménages
  3. Wealth quintiles analysis
  • Bibliothèque de code pour l'adaptation infranationale
    Version française
  • 1. Pour commencer
    • 1.1 À propos et comment nous contacter
    • 1.2 Pour tous les utilisateurs
    • 1.3 Pour l’équipe SNT
    • 1.4 Pour les analystes
    • 1.5 Acronymes et bibliothèque de ressources
    • 1.6 Produire des résultats de haute qualité
  • 2. Assemblage et gestion des données
    • 2.1 Utilisation des shapefiles
      • Aperçu des données spatiales
      • Utilisation et visualisation de base des shapefiles
      • Gestion et personnalisation des shapefiles
      • Fusion des shapefiles avec des données tabulaires
    • 2.2 Formations sanitaires
      • Correspondance approximative des noms entre jeux de données
      • Coordonnées des établissements de santé et données ponctuelles
    • 2.3 Données de cas de routine (DHIS2)
      • Détermination du statut actif et inactif
      • Data extraction from DHIS2
      • Prétraitement des données DHIS2
      • Méthodes de détection des données manquantes
      • Outlier correction
      • Considérations contextuelles
      • Taux de notification des établissements de santé
      • Quality control/checks
      • Outlier detection methods
      • Imputation of missing data
      • Final database
    • 2.4 Données du stock
      • lmis
    • 2.5 Données démographiques
      • Données démographiques nationales
      • Raster de population WorldPop
    • 2.6 Enquêtes nationales auprès des ménages
      • DHS Data Overview and Preparation
      • All-Cause Child Mortality
      • Wealth quintiles analysis
      • Extraction of ITN ownership, access, and usage
      • Extracion of prevalence data
      • Calculation of treatment-seeking data
    • 2.7 Données entomologiques
      • Données entomologiques
    • 2.8 Données climatiques et environnementales
      • Extraction de données climatiques et environnementales à partir de données raster
    • 2.9 Données modélisées
      • Generating spatial modeled estimates
      • Travailler avec les estimations modélisées géospatiales
      • Modeled Estimates of Entomological Indicators
      • Mortality estimates from IHME
    • 2.10 Données financières
  • 3. Analyse de la situation
    • 3.1 Revue des interventions historiques
      • Prise en charge des cas
      • Interventions de routine
      • Les campagnes de masse de moustiquaires
      • Les campagnes de chimioprévention
      • Autres interventions lutte antivectorielle
    • 3.2 Analyse des tendances
    • 3.3 Analyse des facteurs de risque
    • 3.4 Évaluation de l’impact des interventions
    • 3.5 Analyse des coûts
  • 4. Stratification
    • 4.1 Stratification épidémiologique
      • Aperçu de l’incidence et incidence brute
      • Ajustement de l’incidence 1 : noncomplétude du dépistage
      • Ajustement de l’incidence 2 : noncomplétude du rapportage
      • Ajustement de l’incidence 2 : recherche des soins
      • Incidence stratification
      • Stratification par prévalence et mortalité
      • Risk categorization
      • Combined risk categorization
      • Risk categorization REMOVE?
    • 4.2 Accès aux soins
    • 4.3 Saisonnalité
      • Définir les zones saisonnières
      • Durées de saisonnalité
    • 4.4 Microstratification urbaine
  • 5. Ciblage et priorisation des interventions
    • 5.1 Ciblage des interventions
    • 5.2 Priorisation
    • 5.3 Optimisation dans la limite des ressources

On this page

  • Overview
  • Background
    • Core Indicators for Wealth Analysis
    • Gini Coefficient for Wealth Inequality
    • Wealth index limitations
  • Step-by-step
    • Step 1: Load and prepare DHS Data
    • Step 2: Calculate wealth quintile distributions
      • Step 2.1: Set survey design
      • Step 2.2: Calculate wealth quintiles
    • Step 3: Calculate Gini coefficients
      • Step 3.1: Gini coefficient calculation function
      • Step 3.2: Apply Gini coefficient calculation function
    • Step 4: Visualize wealth distribution
      • Step 4.1: Join wealth analysis with shapefile
      • Step 4.2: Visualize dominant wealth quintiles
      • Step 4.3: Visualize proportion of districts in poorest wealth quintile
      • Step 4.4: Visualize quintile proportions by district
      • Step 4.5: Visualize wealth inequality using Gini coefficient
    • Step 5: Save processed wealth analysis data
  • Summary
  • Full Code
  1. 2. Assemblage et gestion des données
  2. 2.6 Enquêtes nationales auprès des ménages
  3. Wealth quintiles analysis

Wealth quintiles analysis

Beginner

Overview

Wealth quintiles provide a relative measure of household economic status and are essential for analyzing equity in malaria intervention planning. In SNT, wealth analysis helps identify socioeconomic disparities, prioritize resources for disadvantaged populations, and tailor strategies to reach the poorest households effectively.

The DHS Program calculates wealth quintiles using a wealth index derived from principal component analysis of household assets and characteristics. This approach provides a consistent measure of relative economic status across countries where income data is often unreliable or unavailable.

This page focuses on extracting and analyzing wealth quintile distributions and Gini coefficients from DHS surveys for subnational malaria intervention tailoring.

NoteObjectives
  • Access and process DHS wealth quintile data at subnational levels
  • Calculate Gini coefficients for wealth inequality analysis following DHS methodology
  • Generate wealth distribution visualizations for SNT decision-making
  • Compile and save clean, subnational summaries of wealth indicators

Background

Core Indicators for Wealth Analysis

Wealth analysis in DHS relies on two key variables that are pre-calculated in the survey data:

  • Wealth quintiles (hv270): Categorical variable dividing households into five equal groups (poorest, poorer, middle, richer, richest) based on the wealth index
  • Wealth index factor score (hv271): Continuous variable representing the underlying wealth score used to create the quintiles

Gini Coefficient for Wealth Inequality

The Gini coefficient measures wealth inequality within populations, ranging from 0 (perfect equality) to 1 (maximum inequality). DHS calculates Gini coefficients using the Brown formula with 100 wealth groups, providing a standardized measure for comparing inequality across regions or over time.

Wealth index limitations

The DHS wealth index measures relative wealth within a country, not absolute poverty, and therefore may not capture all dimensions of socioeconomic status. Wealth quintiles are also calculated separately for each survey, limiting direct comparability of wealth distribution over time or across survey cycles.

Step-by-step

This page covers wealth quintile analysis using raw DHS survey data, to maintain flexibility for subnational analyses and Gini coefficient calculation.

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Load and prepare DHS Data

  • R
  • Python
# install or load required packages
pacman::p_load(
  here,       # for handling relative file paths
  haven,      # for reading DHS .dta files and labelled data
  dplyr,      # for data wrangling
  stringr,    # for string operations
  ggplot2,    # for visualization
  sf,         # for working with shapefiles
  rio,        # for saving outputs
  survey,     # for complex survey analysis
  ineq,       # for inequality measures (Gini)
  ggforce,    # for scattered pie plots
  ggtext      # enhanced plot text
)

# import the HR (household) data
sle_dhs_hr <- readRDS(
  here::here("1.6_health_systems/1.6a_dhs/raw/SLHR7AFL.rds")
) |>
  dplyr::mutate(
    # clean admin names
    adm1 = haven::as_factor(hv024) |>
      toupper() |>
      stringr::str_remove(" PROVINCE"),
    adm2 = haven::as_factor(shdist) |>
      toupper(),
    # extract wealth variables
    wealth_quintile = hv270,
    wealth_score = hv271,
    sample_weight = hv005 / 1000000,
    de_jure_members = hv012
  )
NoteOutput

To adapt the code:

  • Line 23: Replace HR file path with your country’s DHS dataset
  • Lines 26-31: Update administrative variable mappings if needed
  • Lines 33-36: Ensure wealth variables match your DHS dataset structure

Step 2: Calculate wealth quintile distributions

Step 2.1: Set survey design

  • R
  • Python
# Set up survey design
des_wealth <- survey::svydesign(
  ids = ~hv021,
  strata = ~hv022,
  weights = ~sample_weight,
  data = sle_dhs_hr,
  nest = TRUE
)

Step 2.2: Calculate wealth quintiles

Calculate the distribution of households and population across wealth quintiles at the adm2 level.

  • R
  • Python
Show the code
# Calculate wealth quintile distribution by admin2 - FIXED VERSION
wealth_quintile_adm2 <- survey::svyby(
  ~factor(wealth_quintile),
  ~adm2,
  des_wealth,
  FUN = svymean,
  vartype = c("se", "ci"),
  keep.var = TRUE,
  na.rm = TRUE  # Handles missing values
)

# Robust pivoting - handle cases where some factor levels are missing
wealth_quintile_adm2 <- sle_dhs_hr |>
  dplyr::group_by(adm2) |>
  dplyr::summarise(
    total_hh = sum(sample_weight, na.rm = TRUE),
    poorest = sum(sample_weight * (wealth_quintile == 1), na.rm = TRUE) / total_hh,
    poorer = sum(sample_weight * (wealth_quintile == 2), na.rm = TRUE) / total_hh,
    middle = sum(sample_weight * (wealth_quintile == 3), na.rm = TRUE) / total_hh,
    richer = sum(sample_weight * (wealth_quintile == 4), na.rm = TRUE) / total_hh,
    richest = sum(sample_weight * (wealth_quintile == 5), na.rm = TRUE) / total_hh,
    .groups = "drop"
  ) |>
  tidyr::pivot_longer(
    cols = c(poorest, poorer, middle, richer, richest),
    names_to = "quintile_label",
    values_to = "proportion"
  ) |>
  dplyr::mutate(
    proportion_pct = round(proportion * 100, 1)
  )

Step 3: Calculate Gini coefficients

Step 3.1: Gini coefficient calculation function

Implement the DHS methodology for calculating Gini coefficients using the Brown Formula.

  • R
  • Python
Show the code
#' Calculate Gini coefficient following DHS methodology
calculate_dhs_gini <- function(wealth_scores, weights, population, n_groups = 100) {
  # Remove missing values
  complete_cases <- !is.na(wealth_scores) & !is.na(weights) & !is.na(population)
  wealth_scores <- wealth_scores[complete_cases]
  weights <- weights[complete_cases]
  population <- population[complete_cases]
  
  if (length(wealth_scores) == 0) return(NA_real_)
  
  # DHS methodology steps
  min_wealth <- min(wealth_scores)
  max_wealth <- max(wealth_scores)
  
  # Create wealth groups
  wealth_group <- cut(
    wealth_scores,
    breaks = seq(min_wealth, max_wealth, length.out = n_groups + 1),
    include.lowest = TRUE,
    labels = FALSE
  )
  
  # Tally population and wealth for each group
  group_data <- data.frame(
    group = wealth_group,
    weight = weights,
    population = population,
    wealth_score = wealth_scores
  )
  
  group_summary <- group_data |>
    dplyr::group_by(group) |>
    dplyr::summarise(
      group_pop = sum(weight * population, na.rm = TRUE),
      group_wealth = sum(weight * (wealth_score - min_wealth) * population, na.rm = TRUE),
      .groups = "drop"
    ) |>
    dplyr::arrange(group)
  
  # Calculate cumulative proportions
  group_summary <- group_summary |>
    dplyr::mutate(
      cum_pop = cumsum(group_pop),
      cum_wealth = cumsum(group_wealth)
    )
  
  total_pop <- sum(group_summary$group_pop)
  total_wealth <- sum(group_summary$group_wealth)
  
  group_summary <- group_summary |>
    dplyr::mutate(
      prop_pop = cum_pop / total_pop,
      prop_wealth = cum_wealth / total_wealth
    )
  
  # Apply Brown Formula for Gini coefficient
  prop_pop <- c(0, group_summary$prop_pop)
  prop_wealth <- c(0, group_summary$prop_wealth)
  
  n <- length(prop_pop)
  G <- 0
  for (k in 2:n) {
    G <- G + (prop_pop[k] - prop_pop[k-1]) * (prop_wealth[k] + prop_wealth[k-1])
  }
  
  return(1 - G)
}

Step 3.2: Apply Gini coefficient calculation function

  • R
  • Python
gini_by_adm2 <- sle_dhs_hr |>
  dplyr::group_by(adm2) |>
  dplyr::summarise(
    gini = calculate_dhs_gini(
      wealth_scores = wealth_score,
      weights = sample_weight, 
      population = de_jure_members
    ),
    sample_size = dplyr::n(),
    .groups = "drop"
  ) |>
  dplyr::mutate(
    gini = round(gini, 3),
    reliable_estimate = sample_size >= 25  # Reliability flag (?)
  )

Step 4: Visualize wealth distribution

Step 4.1: Join wealth analysis with shapefile

  • R
  • Python
Show the code
# Get the DHS adm2 shapefile
sle_dhs_shp2 <- sf::read_sf(
  here::here(
    "01_foundational/1a_administrative_boundaries", 
    "1ai_adm2",
    "sdr_subnational_boundaries_adm2.shp"
  )
) |>
  dplyr::select(
    adm1 = OTHREGNA,    # ADM1 for reference
    adm2 = DHSREGEN,    # ADM2 column
    adm2_code = REG_ID  # ADM2 code
  ) |>
  dplyr::mutate(
    adm1 = toupper(adm1),
    adm2 = toupper(adm2)  # ← Proper adm2 cleaning
  )

# Prepare wealth distribution data for mapping
wealth_distribution_final <- wealth_quintile_adm2 |>
  dplyr::left_join(sle_dhs_shp2, by = "adm2") |>
  sf::st_as_sf()

# Prepare Gini data for mapping  
gini_final <- gini_by_region |>
  dplyr::left_join(sle_dhs_shp2, by = "adm2") |>
  sf::st_as_sf()

Step 4.2: Visualize dominant wealth quintiles

  • R
  • Python
Show the code
dominant_quintile_adm2 <- wealth_quintile_adm2 |>  
  dplyr::group_by(adm2) |>                         
  dplyr::slice_max(proportion, n = 1) |>
  dplyr::select(adm2, dominant_quintile = quintile_label, dominant_prop = proportion) |>
  dplyr::ungroup() |>
  dplyr::mutate(
    dominant_quintile = factor(
      dominant_quintile,
      levels = c("poorest", "poorer", "middle", "richer", "richest"),
      labels = c("Poorest", "Poorer", "Middle", "Richer", "Richest")
    )
  )

# Join with ADM2 shapefile
dominant_quintile_adm2_map <- dominant_quintile_adm2 |>
  dplyr::left_join(sle_dhs_shp2, by = "adm2") |>  
  sf::st_as_sf()

# Define consistent color scheme
wealth_quintile_colors <- c(
  "Poorest" = "#E41A1C",  # Red
  "Poorer" = "#FF7F00",   # Orange
  "Middle" = "#FFD700",   # Yellow
  "Richer" = "#4DAF4A",   # Green
  "Richest" = "#984EA3"   # Purple
)

# Dominant wealth quintile by district with custom colors
plot_dominant_quintile_adm2 <- dominant_quintile_adm2_map |>
  ggplot2::ggplot() +
  ggplot2::geom_sf(ggplot2::aes(fill = dominant_quintile), color = "white", size = 0.1) +
  # Use custom colors instead of brewer palette
  ggplot2::scale_fill_manual(
    values = wealth_quintile_colors,
    name = "Dominant Wealth Quintile"
  ) +
  ggplot2::labs(
    title = "Dominant Wealth Quintile by District",
    subtitle = "Majority wealth category in each district"
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "right",
    plot.title = ggplot2::element_text(face = "bold", hjust = 0.5),
    plot.subtitle = ggplot2::element_text(hjust = 0.5)
  )

# Add proportion labels to the map
plot_dominant_quintile_adm2 <- plot_dominant_quintile_adm2 +
  ggplot2::geom_sf_text(
    ggplot2::aes(label = scales::percent(dominant_prop, accuracy = 1)),
    size = 2.5,
    color = "black",
    fontface = "bold"
  )
NoteOutput

Step 4.3: Visualize proportion of districts in poorest wealth quintile

  • R
  • Python
Show the code
# Proportion in poorest quintile by district 
plot_poorest_adm2 <- wealth_distribution_final |>  
  dplyr::filter(quintile_label == "poorest") |>  
  ggplot2::ggplot() +
  ggplot2::geom_sf(ggplot2::aes(fill = proportion), color = "white", size = 0.1) +
ggplot2::scale_fill_gradientn(
    colors = c("#fee5d9", "#fcae91", "#fb6a4a", "#de2d26", "#a50f15"),
    name = "Proportion\nin Poorest Quintile",
    labels = scales::percent
    ) +
  ggplot2::labs(
    title = "Proportion in Poorest Wealth Quintile by District"
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "right",
    plot.title = ggplot2::element_text(face = "bold", hjust = 0.5),
    plot.subtitle = ggplot2::element_text(hjust = 0.5)
  )
NoteOutput

Step 4.4: Visualize quintile proportions by district

  • R
  • Python
Show the code
# Calculate district centroids for pie chart placement
district_centroids <- sle_dhs_shp2 |>
  sf::st_centroid() |>
  sf::st_coordinates() |>
  as.data.frame() |>
  dplyr::bind_cols(adm2 = sle_dhs_shp2$adm2) |>
  dplyr::rename(longitude = X, latitude = Y)

# Prepare wealth data for pie charts - reaggregate at district level
wealth_pie_data <- sle_dhs_hr |>
  dplyr::group_by(adm2) |>
  dplyr::summarise(
    total_weight = sum(sample_weight, na.rm = TRUE),
    Q1 = sum(sample_weight * (wealth_quintile == 1), na.rm = TRUE) / total_weight,
    Q2 = sum(sample_weight * (wealth_quintile == 2), na.rm = TRUE) / total_weight,
    Q3 = sum(sample_weight * (wealth_quintile == 3), na.rm = TRUE) / total_weight,
    Q4 = sum(sample_weight * (wealth_quintile == 4), na.rm = TRUE) / total_weight,
    Q5 = sum(sample_weight * (wealth_quintile == 5), na.rm = TRUE) / total_weight,
    sample_size = dplyr::n(),
    .groups = "drop"
  ) |>
  # Join with centroids for positioning
  dplyr::left_join(district_centroids, by = "adm2")

# Prepare data for pie charts
wealth_pie_ggforce <- wealth_pie_data |>
  tidyr::pivot_longer(
    cols = c(Q1, Q2, Q3, Q4, Q5),
    names_to = "quintile",
    values_to = "proportion"
  ) |>
  dplyr::group_by(adm2) |>
  dplyr::mutate(
    # Calculate start and end angles for each segment
    end_angle = cumsum(proportion) * 2 * pi,
    start_angle = lag(end_angle, default = 0),
    # Convert quintile to factor with proper order
    quintile = factor(quintile,
                     levels = c("Q1", "Q2", "Q3", "Q4", "Q5"),
                     labels = c("Poorest", "Poorer", "Middle", "Richer", "Richest"))
  ) |>
  dplyr::ungroup()

# Create pie chart map
wealth_pie_map <- ggplot2::ggplot() +
  # Base map
  ggplot2::geom_sf(data = sle_dhs_shp2, fill = "white", color = "gray60", linewidth = 0.2) +
  # Pie charts
  ggforce::geom_arc_bar(
    data = wealth_pie_ggforce,
    ggplot2::aes(
      x0 = longitude, 
      y0 = latitude, 
      r0 = 0, 
      r = 0.08,  # Radius of pies
      start = start_angle, 
      end = end_angle, 
      fill = quintile
    ),
    color = "white",
    linewidth = 0.1
  ) +
  ggplot2::scale_fill_manual(
    values = wealth_quintile_colors,
    name = "Wealth Quintile"
  ) +
  ggplot2::labs(
    title = "Wealth Quintile Distribution by District",
    subtitle = "Proportion of households in each wealth quintile",
    caption = paste("District sample sizes range from", 
                   min(wealth_pie_data$sample_size), "to", 
                   max(wealth_pie_data$sample_size), "households")
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold", hjust = 0.5, size = 16),
    plot.subtitle = ggplot2::element_text(hjust = 0.5, size = 11),
    legend.position = "right"
  )
NoteOutput

Step 4.5: Visualize wealth inequality using Gini coefficient

  • R
  • Python
Show the code
# Gini coefficient by district
plot_gini_adm2 <- gini_final |>  
  ggplot2::ggplot() +
  ggplot2::geom_sf(ggplot2::aes(fill = gini), color = "white", size = 0.1) +
  ggplot2::scale_fill_viridis_c(
    option = "viridis", 
    direction = -1,
    name = "Gini Coefficient"
  ) +
  ggplot2::labs(
    title = "Wealth Inequality by District",
    subtitle = "Gini coefficient (0 = perfect equality, 1 = maximum inequality)"
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "right",
    plot.title = ggplot2::element_text(face = "bold", hjust = 0.5),
    plot.subtitle = ggplot2::element_text(hjust = 0.5)
  )

# Add sample size reliability indicators for Gini map
plot_gini_adm2 <- plot_gini_adm2 +
  ggplot2::geom_sf(
    data = gini_final |> dplyr::filter(!reliable_estimate), 
    fill = NA, 
    color = "red", 
    size = 0.3,
    linetype = "dashed"
  )
NoteOutput

Step 5: Save processed wealth analysis data

Save the wealth analysis outputs for integration into SNT workflows.

  • R
  • Python
# define save directory
save_path <- here::here("1.6_health_systems/1.6a_dhs")

# Save wealth quintile distributions
rio::export(
  wealth_quintile_adm1,
  here::here(save_path, "processed", "wealth_quintile_distributions.csv")
)

# Save Gini coefficients
rio::export(
  gini_by_region,
  here::here(save_path, "processed", "wealth_gini_coefficients.csv")
)

# Save combined wealth analysis dataset
wealth_analysis_final <- list(
  quintile_distributions = wealth_quintile_adm2,
  gini_coefficients = gini_by_region
)

saveRDS(
  wealth_analysis_final,
  here::here(save_path, "processed", "complete_wealth_analysis.rds")
)

Summary

Wealth quintile analysis provides critical insights for equity-focused malaria programming in SNT. By understanding the socioeconomic distribution of populations and how intervention coverage varies by wealth status, SNT teams can make evidence-based decisions to ensure malaria interventions reach the most vulnerable populations effectively.

The methodology presented here follows DHS guidelines and provides reproducible approaches for calculating wealth distributions and Gini coefficients at subnational levels.

Full Code

  • R
  • Python
 

©2026 Applied Health Analytics for Delivery and Innovation. All rights reserved