When direct survey estimates run out of statistical power - typically below adm1 (see the caveat in DHS survey analysis) - model-based geostatistics (MBG) fills the gap. It models the indicator as a smooth spatial surface from geolocated survey clusters, predicts a continuous raster, and aggregates that raster to admin units using population weights.
There are two entry points:
-
fit_mbg_indicator()- fit one indicator from a prepared cluster table. Best when you want full control or are modeling a single custom indicator. -
run_mbg_pipeline()- the full pipeline: discover surveys, prepare every indicator, fit, predict, and aggregate to adm2/adm1/adm0 in one call.
This workflow requires the optional spatial stack (INLA,
mbg, sf, terra). See installation.
Inputs
MBG needs three things:
-
A cluster table with one row per survey cluster and
columns:
cluster_id,indicator(event count / numerator),samplesize(denominator),x(longitude),y(latitude). - A population raster for the modeled population (and optionally age-specific rasters for child / women-of-reproductive-age denominators).
-
Admin boundaries as
sfobjects (adm0_sf,adm1_sf,adm2_sf).
Single indicator: fit_mbg_indicator()
The package’s prep_*_mbg() / calc_*_mbg()
helpers build the cluster table for each indicator family, but it is
just a data.table/data.frame - here is the
shape, built directly from a DHS PR recode and the
GE GPS recode (condensed from inst/scripts/mbg_pfpr2_10_dhs.R):
library(sntmethods)
# cluster-level positives / tested, joined to GPS coordinates
pfpr_dt <- data.table::data.table(
cluster_id = pfpr_sf$cluster_id,
indicator = pfpr_sf$n_positive, # numerator
samplesize = pfpr_sf$n_tested, # denominator
x = pfpr_coords[, 1], # longitude
y = pfpr_coords[, 2] # latitude
)fit_mbg_indicator() is the all-in-one wrapper around
mbg::MbgModelRunner. Give it the cluster table, a
population raster, and any subset of admin shapefiles; it fits the model
and returns a comprehensive list:
fit <- fit_mbg_indicator(
cluster_data = pfpr_dt,
indicator_name = "pfpr_mic_2_10",
population_raster = pop_rast,
adm1_sf = adm1_sf,
adm2_sf = adm2_sf,
primary_level = "adm2",
output_levels = c("adm1", "adm2"),
survey_year = 2016,
source_label = "DHS",
output_dir = path_output, # NULL = write nothing
cache_dir = fs::path(path_output, "cache") # caches the INLA fit
)The result contains:
| Element | What it is |
|---|---|
$cluster_data |
the input cluster table |
$cell_predictions |
mean / lower / upper
SpatRasters plus draws |
$admin |
long-format tibble of estimates per admin level |
$id_raster, $aggregation_tables
|
internals for re-aggregation |
$model_runner |
the fitted mbg runner |
$saved_files, $cache_files,
$inputs
|
provenance |
Country/ISO codes are derived automatically from the median cluster
coordinate. When cache_dir is set, the prediction matrix is
cached so re-runs skip the expensive INLA fit.
Mapping the result
The mean surface and its uncertainty are plain
SpatRasters, so any mapping tool works. The package also
provides generate_indicator_map(),
generate_all_maps(), and cluster diagnostics via
plot_mbg_clusters():
cell_preds <- fit$cell_predictions
r_mean <- cell_preds$mean * 100 # PfPR as %
r_unc <- (cell_preds$upper - cell_preds$lower) * 100 # 95% CI width (pp)
tmap::tm_shape(r_mean) +
tmap::tm_raster(palette = "YlOrRd", title = "PfPR2-10 (%)") +
tmap::tm_shape(adm2_sf) + tmap::tm_borders(lwd = 0.3)Full pipeline: run_mbg_pipeline()
For a whole country across many indicators,
run_mbg_pipeline() orchestrates everything: it
auto-discovers surveys from the parquet archive, prepares cluster data
per indicator, fits the INLA/MBG models, generates prediction rasters,
aggregates to adm2 (or adm3) via population-weighted zonal statistics,
and rolls up to adm1 and adm0.
This is not a general-purpose MBG wrapper.
run_mbg_pipeline() is built around AHADI’s hive-partitioned
parquet archive (see ?dhs_read)
and dhs_read()’s discovery semantics. Use it when you have
- or are willing to build - that archive and want to fit many indicators
across many surveys end-to-end.
For ad-hoc, single-survey, or single-indicator work, stay with
fit_mbg_indicator() (the Single indicator section
above) - it takes a cluster-level data frame you load yourself
(e.g. with sntutils::read() + a prep_*_mbg()
helper) and runs a single MBG fit, no archive required. See ?run_mbg_pipeline
for the full When to use this / When not to use this
checklist.
results <- run_mbg_pipeline(
adm0_sf = adm0, adm1_sf = adm1, adm2_sf = adm2,
pop_raster = list("2016" = "path/to/bdi_pop_2016.tif"),
pop_raster_u5 = list("2016" = "path/to/bdi_pop_u5_2016.tif"),
path_dhs_parquet = "path/to/parquet",
table_out_path = "output/tables",
raster_out_path = "output/rasters",
intermediate_out_path = "output/intermediate",
survey_year = 2016,
indicators = c("pfpr", "itn", "csb", "act", "eff_cm")
)
# long-format tables at each admin level, matching the DHS output shape
results$final_dataset$adm0
results$final_dataset$adm1
results$final_dataset$adm2Supported indicator families: pfpr, itn,
irs, csb, act,
antimalarial, fever, anc,
iptp, epi, u5mr,
anemia, smc, and eff_cm
(effective case management).
Age-specific denominators
Supply age-specific population rasters for more accurate
denominators: pop_raster_u5 (child indicators),
pop_raster_wra (ANC/IPTp), pop_raster_1_2 (EPI
12-23 months), and age-band rasters for ITN stratification. U5MR is
expressed per 1,000 live births; coverage indicators as percentages.
Outputs and saving
Pipeline and fit outputs are long-format tables with
point, ci_l, ci_u,
numerator, denominator, and indicator metadata
- the same shape as the DHS
survey outputs, so survey and modeled estimates stack cleanly.
Helpers save_mbg_cluster_data(),
save_mbg_rasters(), and save_indicator_map()
persist intermediate and final artifacts.
See also
- A complete single-indicator example:
inst/scripts/mbg_pfpr2_10_dhs.Randinst/scripts/mbg_itn_access_dhs.R. - The team reference
inst/docs/mbg_pipeline_guide.md. -
Reference
for
fit_mbg_indicator(),run_mbg_pipeline(), and theprep_*_mbg()family. ```
