Skip to contents

When direct survey estimates run out of statistical power - typically below adm1 (see the caveat in DHS survey analysis) - model-based geostatistics (MBG) fills the gap. It models the indicator as a smooth spatial surface from geolocated survey clusters, predicts a continuous raster, and aggregates that raster to admin units using population weights.

There are two entry points:

  • fit_mbg_indicator() - fit one indicator from a prepared cluster table. Best when you want full control or are modeling a single custom indicator.
  • run_mbg_pipeline() - the full pipeline: discover surveys, prepare every indicator, fit, predict, and aggregate to adm2/adm1/adm0 in one call.

This workflow requires the optional spatial stack (INLA, mbg, sf, terra). See installation.

Inputs

MBG needs three things:

  1. A cluster table with one row per survey cluster and columns: cluster_id, indicator (event count / numerator), samplesize (denominator), x (longitude), y (latitude).
  2. A population raster for the modeled population (and optionally age-specific rasters for child / women-of-reproductive-age denominators).
  3. Admin boundaries as sf objects (adm0_sf, adm1_sf, adm2_sf).

Single indicator: fit_mbg_indicator()

The package’s prep_*_mbg() / calc_*_mbg() helpers build the cluster table for each indicator family, but it is just a data.table/data.frame - here is the shape, built directly from a DHS PR recode and the GE GPS recode (condensed from inst/scripts/mbg_pfpr2_10_dhs.R):

library(sntmethods)

# cluster-level positives / tested, joined to GPS coordinates
pfpr_dt <- data.table::data.table(
  cluster_id = pfpr_sf$cluster_id,
  indicator  = pfpr_sf$n_positive,  # numerator
  samplesize = pfpr_sf$n_tested,    # denominator
  x = pfpr_coords[, 1],             # longitude
  y = pfpr_coords[, 2]              # latitude
)

fit_mbg_indicator() is the all-in-one wrapper around mbg::MbgModelRunner. Give it the cluster table, a population raster, and any subset of admin shapefiles; it fits the model and returns a comprehensive list:

fit <- fit_mbg_indicator(
  cluster_data      = pfpr_dt,
  indicator_name    = "pfpr_mic_2_10",
  population_raster = pop_rast,
  adm1_sf           = adm1_sf,
  adm2_sf           = adm2_sf,
  primary_level     = "adm2",
  output_levels     = c("adm1", "adm2"),
  survey_year       = 2016,
  source_label      = "DHS",
  output_dir        = path_output,            # NULL = write nothing
  cache_dir         = fs::path(path_output, "cache")  # caches the INLA fit
)

The result contains:

Element What it is
$cluster_data the input cluster table
$cell_predictions mean / lower / upper SpatRasters plus draws
$admin long-format tibble of estimates per admin level
$id_raster, $aggregation_tables internals for re-aggregation
$model_runner the fitted mbg runner
$saved_files, $cache_files, $inputs provenance

Country/ISO codes are derived automatically from the median cluster coordinate. When cache_dir is set, the prediction matrix is cached so re-runs skip the expensive INLA fit.

Mapping the result

The mean surface and its uncertainty are plain SpatRasters, so any mapping tool works. The package also provides generate_indicator_map(), generate_all_maps(), and cluster diagnostics via plot_mbg_clusters():

cell_preds <- fit$cell_predictions
r_mean <- cell_preds$mean * 100                          # PfPR as %
r_unc  <- (cell_preds$upper - cell_preds$lower) * 100     # 95% CI width (pp)

tmap::tm_shape(r_mean) +
  tmap::tm_raster(palette = "YlOrRd", title = "PfPR2-10 (%)") +
  tmap::tm_shape(adm2_sf) + tmap::tm_borders(lwd = 0.3)

Full pipeline: run_mbg_pipeline()

For a whole country across many indicators, run_mbg_pipeline() orchestrates everything: it auto-discovers surveys from the parquet archive, prepares cluster data per indicator, fits the INLA/MBG models, generates prediction rasters, aggregates to adm2 (or adm3) via population-weighted zonal statistics, and rolls up to adm1 and adm0.

This is not a general-purpose MBG wrapper. run_mbg_pipeline() is built around AHADI’s hive-partitioned parquet archive (see ?dhs_read) and dhs_read()’s discovery semantics. Use it when you have - or are willing to build - that archive and want to fit many indicators across many surveys end-to-end.

For ad-hoc, single-survey, or single-indicator work, stay with fit_mbg_indicator() (the Single indicator section above) - it takes a cluster-level data frame you load yourself (e.g. with sntutils::read() + a prep_*_mbg() helper) and runs a single MBG fit, no archive required. See ?run_mbg_pipeline for the full When to use this / When not to use this checklist.

results <- run_mbg_pipeline(
  adm0_sf = adm0, adm1_sf = adm1, adm2_sf = adm2,
  pop_raster    = list("2016" = "path/to/bdi_pop_2016.tif"),
  pop_raster_u5 = list("2016" = "path/to/bdi_pop_u5_2016.tif"),
  path_dhs_parquet = "path/to/parquet",
  table_out_path        = "output/tables",
  raster_out_path       = "output/rasters",
  intermediate_out_path = "output/intermediate",
  survey_year = 2016,
  indicators  = c("pfpr", "itn", "csb", "act", "eff_cm")
)

# long-format tables at each admin level, matching the DHS output shape
results$final_dataset$adm0
results$final_dataset$adm1
results$final_dataset$adm2

Supported indicator families: pfpr, itn, irs, csb, act, antimalarial, fever, anc, iptp, epi, u5mr, anemia, smc, and eff_cm (effective case management).

Age-specific denominators

Supply age-specific population rasters for more accurate denominators: pop_raster_u5 (child indicators), pop_raster_wra (ANC/IPTp), pop_raster_1_2 (EPI 12-23 months), and age-band rasters for ITN stratification. U5MR is expressed per 1,000 live births; coverage indicators as percentages.

Outputs and saving

Pipeline and fit outputs are long-format tables with point, ci_l, ci_u, numerator, denominator, and indicator metadata - the same shape as the DHS survey outputs, so survey and modeled estimates stack cleanly. Helpers save_mbg_cluster_data(), save_mbg_rasters(), and save_indicator_map() persist intermediate and final artifacts.

See also