Skip to contents

Generic, low-level wrapper around mbg::MbgModelRunner that takes a pre-built cluster-level table plus a population raster and any subset of admin-0/1/2/3 shapefiles, fits a single MBG model and returns a comprehensive list of in-memory artefacts (cluster table, mean/lower/upper rasters, admin-level long-format tibbles, the fitted MbgModelRunner object, the id raster, aggregation tables, the cell-prediction draws matrix and an inputs echo of every parameter the function received).

Usage

fit_mbg_indicator(
  cluster_data,
  indicator_name,
  population_raster,
  adm0_sf = NULL,
  adm1_sf = NULL,
  adm2_sf = NULL,
  adm3_sf = NULL,
  primary_level = NULL,
  output_levels = NULL,
  covariates = NULL,
  pixel_size = 0.04166667,
  n_samples = 250,
  seed = 1,
  cluster_cols = list(cluster_id = "cluster_id", x = "x", y = "y", indicator =
    "indicator", samplesize = "samplesize"),
  id_field = "shapeID",
  indicator_title = NULL,
  indicator_unit_scale = 100,
  survey_year = NULL,
  source_label = NULL,
  output_dir = NULL,
  cache_dir = NULL,
  use_cache = TRUE,
  overwrite = FALSE,
  return_draws = FALSE,
  verbose = TRUE,
  ...
)

Arguments

cluster_data

Any data-frame-like object (data.frame, tibble, data.table, sf with x/y columns). Coerced internally via data.table::as.data.table(). Must contain columns cluster_id, x, y, indicator, samplesize (or supply alternative names via cluster_cols).

indicator_name

Character. Short slug used in filenames and admin-tibble columns (e.g. "pfpr_mic_2_10").

population_raster

A terra::SpatRaster used as the template raster for prediction and as weights for population- weighted aggregation. Required.

adm0_sf, adm1_sf, adm2_sf, adm3_sf

Optional sf polygon layers for each admin level. At least one must be supplied. The default modelling level is the highest provided (i.e. adm3 if supplied, otherwise adm2, etc.).

primary_level

Character, one of "adm0", "adm1", "adm2", "adm3". The level the MBG model is fitted on and the level cell-predictions are aggregated to first. Defaults to the highest level for which a shapefile was supplied.

output_levels

Character vector of admin levels to summarise to. Defaults to all admin levels for which a shapefile was supplied.

covariates

Optional named list of terra::SpatRaster objects to use as covariates. If NULL, an intercept-only constant raster is built automatically (pure spatial smoothing).

pixel_size

Numeric. Informational pixel size in degrees, echoed in $inputs for downstream reporting. The actual prediction grid is inherited from population_raster; this argument does NOT resample the raster. Default 0.04166667 (\(\approx\) 5 km at the equator).

n_samples

Integer. Number of posterior draws drawn from the fitted model. Default 250.

seed

Integer. RNG seed for reproducibility. Default 1.

cluster_cols

Named list mapping the canonical cluster columns to user-supplied column names. Defaults to list(cluster_id = "cluster_id", x = "x", y = "y", indicator = "indicator", samplesize = "samplesize").

id_field

Character. Column name in the shapefiles holding the polygon id (default "shapeID"; falls back to a unique row index if not present).

indicator_title

Optional human-readable label for the indicator, used in column headers / messaging. Defaults to a title-cased version of indicator_name.

indicator_unit_scale

Numeric. Scaling factor applied when computing admin-level point estimates and CIs from the smoothed probability surface. Default 100 (i.e. percentages).

survey_year

Optional integer survey year. Default NULL.

source_label

Optional character data source label ("DHS", "MIS", "routine", ...). Generic replacement for the older survey_type argument. Default NULL.

output_dir

Optional path. If NULL (default), nothing is written to disk. If supplied, rasters/, cluster_data/ and final_data/ subfolders are created and pipeline-style files are written.

cache_dir

Optional path for cached id-raster, aggregation table(s) and cell-prediction matrix. If NULL and output_dir is supplied, defaults to file.path( output_dir, "cache"). If both are NULL, no caching is performed.

use_cache

Logical. If FALSE, caches are ignored on read (but still written if cache_dir is set). Default TRUE.

overwrite

Logical. If TRUE, ignore existing cell- prediction cache entries and always refit. Default FALSE.

return_draws

Logical. If TRUE, the full n_pixel x n_samples draws matrix is returned in $cell_predictions$draws. Default FALSE (memory- conservative).

verbose

Logical. If TRUE, print step-by-step CLI messages. Default TRUE.

...

Additional arguments forwarded to mbg::build_aggregation_table() and mbg::MbgModelRunner$new() (filtered by each function's formals so unrecognised names are silently ignored).

Value

A named list with elements:

cluster_data

The cleaned input cluster table (data.table).

cell_predictions

Named list with mean, lower, upper SpatRaster objects, and optionally draws (the full posterior matrix) when return_draws = TRUE.

admin

Named list of long-format tibbles, one per output_levels entry, with columns indicator, indicator_title, admin_level, admin_id, admin_name, mean, lower, upper, population, country_iso3, country_iso2, dhs_code, survey_year, source_label. country_iso3 / country_iso2 / dhs_code are derived by reverse-geocoding the median cluster coordinate against adm0_sf (or rnaturalearth as a fallback) and resolved via countrycode. They fall back to NA when the lookup fails.

model_runner

The fitted MbgModelRunner object, or NULL if loaded from cache.

id_raster

The pixel-id raster used by mbg::build_aggregation_table().

aggregation_tables

Named list of aggregation tables (one per output level).

saved_files

Named list of paths written to disk (empty when output_dir is NULL).

cache_files

Named list of paths to cache artefacts (empty when cache_dir is NULL).

inputs

Named list echoing every input parameter for reproducibility.

Details

This function is intentionally generic and is not tied to DHS. The country (country_iso3 / country_iso2 / dhs_code) is always derived automatically from the median cluster coordinate via adm0_sf (when supplied) or rnaturalearth::ne_countries() as a fallback, then resolved through countrycode. The optional survey_year and source_label arguments are pure annotations: when supplied they are echoed onto the admin tibble and used to build pipeline-style filenames; when NULL they are dropped from filenames and written as NA in the admin tibble.

If output_dir is NULL (the default) nothing is written to disk – everything is returned in memory in the result list. Set output_dir to a directory path to also write rasters, the cluster file, the long-format admin file (qs2 + xlsx) and a data dictionary, mirroring the conventions of run_mbg_pipeline().

If cache_dir is non-NULL (or auto-derived from output_dir), three artefacts are cached: the id raster (.tif), the per-level aggregation table(s) (.parquet), and the cell-prediction draws matrix (.qs2). On subsequent calls with the same key the cell-prediction matrix is reloaded and the (expensive) INLA model fit is skipped, allowing the user to re-summarise to additional admin levels without refitting.

Examples

if (FALSE) { # \dontrun{
# Minimal in-memory call (no disk writes)
fit <- fit_mbg_indicator(
  cluster_data      = pfpr_dt,
  indicator_name    = "pfpr_mic_2_10",
  population_raster = pop_rast,
  adm1_sf           = adm1, adm2_sf = adm2
)
fit$cell_predictions$mean
fit$admin$adm2

# Pipeline-style call with disk + cache
fit2 <- fit_mbg_indicator(
  cluster_data      = pfpr_dt,
  indicator_name    = "pfpr_mic_2_10",
  population_raster = pop_rast,
  adm1_sf = adm1, adm2_sf = adm2, adm3_sf = adm3,
  primary_level     = "adm3",
  output_levels     = c("adm1", "adm2", "adm3"),
  survey_year       = 2017,
  source_label      = "MIS",
  output_dir        = "outputs/mbg_fit"
)
# country_iso3 / country_iso2 / dhs_code are derived from cluster coords.
} # }