
Fit a Single MBG Indicator (Generic, Pluggable Admin Levels)
Source:R/mbg_fit_indicator.R
fit_mbg_indicator.RdGeneric, low-level wrapper around mbg::MbgModelRunner that takes
a pre-built cluster-level table plus a population raster and any
subset of admin-0/1/2/3 shapefiles, fits a single MBG model and
returns a comprehensive list of in-memory artefacts (cluster table,
mean/lower/upper rasters, admin-level long-format tibbles, the fitted
MbgModelRunner object, the id raster, aggregation tables, the
cell-prediction draws matrix and an inputs echo of every
parameter the function received).
Usage
fit_mbg_indicator(
cluster_data,
indicator_name,
population_raster,
adm0_sf = NULL,
adm1_sf = NULL,
adm2_sf = NULL,
adm3_sf = NULL,
primary_level = NULL,
output_levels = NULL,
covariates = NULL,
pixel_size = 0.04166667,
n_samples = 250,
seed = 1,
cluster_cols = list(cluster_id = "cluster_id", x = "x", y = "y", indicator =
"indicator", samplesize = "samplesize"),
id_field = "shapeID",
indicator_title = NULL,
indicator_unit_scale = 100,
survey_year = NULL,
source_label = NULL,
output_dir = NULL,
cache_dir = NULL,
use_cache = TRUE,
overwrite = FALSE,
return_draws = FALSE,
verbose = TRUE,
...
)Arguments
- cluster_data
Any data-frame-like object (data.frame, tibble, data.table, sf with x/y columns). Coerced internally via
data.table::as.data.table(). Must contain columnscluster_id,x,y,indicator,samplesize(or supply alternative names viacluster_cols).- indicator_name
Character. Short slug used in filenames and admin-tibble columns (e.g.
"pfpr_mic_2_10").- population_raster
A
terra::SpatRasterused as the template raster for prediction and as weights for population- weighted aggregation. Required.- adm0_sf, adm1_sf, adm2_sf, adm3_sf
Optional
sfpolygon layers for each admin level. At least one must be supplied. The default modelling level is the highest provided (i.e.adm3if supplied, otherwiseadm2, etc.).- primary_level
Character, one of
"adm0","adm1","adm2","adm3". The level the MBG model is fitted on and the level cell-predictions are aggregated to first. Defaults to the highest level for which a shapefile was supplied.- output_levels
Character vector of admin levels to summarise to. Defaults to all admin levels for which a shapefile was supplied.
- covariates
Optional named list of
terra::SpatRasterobjects to use as covariates. IfNULL, an intercept-only constant raster is built automatically (pure spatial smoothing).- pixel_size
Numeric. Informational pixel size in degrees, echoed in
$inputsfor downstream reporting. The actual prediction grid is inherited frompopulation_raster; this argument does NOT resample the raster. Default0.04166667(\(\approx\) 5 km at the equator).- n_samples
Integer. Number of posterior draws drawn from the fitted model. Default
250.- seed
Integer. RNG seed for reproducibility. Default
1.- cluster_cols
Named list mapping the canonical cluster columns to user-supplied column names. Defaults to
list(cluster_id = "cluster_id", x = "x", y = "y", indicator = "indicator", samplesize = "samplesize").- id_field
Character. Column name in the shapefiles holding the polygon id (default
"shapeID"; falls back to a unique row index if not present).- indicator_title
Optional human-readable label for the indicator, used in column headers / messaging. Defaults to a title-cased version of
indicator_name.- indicator_unit_scale
Numeric. Scaling factor applied when computing admin-level point estimates and CIs from the smoothed probability surface. Default
100(i.e. percentages).- survey_year
Optional integer survey year. Default
NULL.- source_label
Optional character data source label (
"DHS","MIS","routine", ...). Generic replacement for the oldersurvey_typeargument. DefaultNULL.- output_dir
Optional path. If
NULL(default), nothing is written to disk. If supplied,rasters/,cluster_data/andfinal_data/subfolders are created and pipeline-style files are written.- cache_dir
Optional path for cached id-raster, aggregation table(s) and cell-prediction matrix. If
NULLandoutput_diris supplied, defaults tofile.path( output_dir, "cache"). If both areNULL, no caching is performed.- use_cache
Logical. If
FALSE, caches are ignored on read (but still written ifcache_diris set). DefaultTRUE.- overwrite
Logical. If
TRUE, ignore existing cell- prediction cache entries and always refit. DefaultFALSE.- return_draws
Logical. If
TRUE, the fulln_pixel x n_samplesdraws matrix is returned in$cell_predictions$draws. DefaultFALSE(memory- conservative).- verbose
Logical. If
TRUE, print step-by-step CLI messages. DefaultTRUE.- ...
Additional arguments forwarded to
mbg::build_aggregation_table()andmbg::MbgModelRunner$new()(filtered by each function's formals so unrecognised names are silently ignored).
Value
A named list with elements:
cluster_dataThe cleaned input cluster table (
data.table).cell_predictionsNamed list with
mean,lower,upperSpatRasterobjects, and optionallydraws(the full posterior matrix) whenreturn_draws = TRUE.adminNamed list of long-format
tibbles, one peroutput_levelsentry, with columnsindicator, indicator_title, admin_level, admin_id, admin_name, mean, lower, upper, population, country_iso3, country_iso2, dhs_code, survey_year, source_label.country_iso3/country_iso2/dhs_codeare derived by reverse-geocoding the median cluster coordinate againstadm0_sf(orrnaturalearthas a fallback) and resolved viacountrycode. They fall back toNAwhen the lookup fails.model_runnerThe fitted
MbgModelRunnerobject, orNULLif loaded from cache.id_rasterThe pixel-id raster used by
mbg::build_aggregation_table().aggregation_tablesNamed list of aggregation tables (one per output level).
saved_filesNamed list of paths written to disk (empty when
output_dirisNULL).cache_filesNamed list of paths to cache artefacts (empty when
cache_dirisNULL).inputsNamed list echoing every input parameter for reproducibility.
Details
This function is intentionally generic and is not tied to DHS. The
country (country_iso3 / country_iso2 / dhs_code)
is always derived automatically from the median cluster coordinate
via adm0_sf (when supplied) or rnaturalearth::ne_countries()
as a fallback, then resolved through countrycode. The optional
survey_year and source_label arguments are pure
annotations: when supplied they are echoed onto the admin tibble and
used to build pipeline-style filenames; when NULL they are
dropped from filenames and written as NA in the admin tibble.
If output_dir is NULL (the default) nothing is written to
disk – everything is returned in memory in the result list. Set
output_dir to a directory path to also write rasters, the
cluster file, the long-format admin file (qs2 + xlsx) and a data
dictionary, mirroring the conventions of run_mbg_pipeline().
If cache_dir is non-NULL (or auto-derived from
output_dir), three artefacts are cached: the id raster
(.tif), the per-level aggregation table(s) (.parquet),
and the cell-prediction draws matrix (.qs2). On subsequent
calls with the same key the cell-prediction matrix is reloaded and
the (expensive) INLA model fit is skipped, allowing the user to
re-summarise to additional admin levels without refitting.
Examples
if (FALSE) { # \dontrun{
# Minimal in-memory call (no disk writes)
fit <- fit_mbg_indicator(
cluster_data = pfpr_dt,
indicator_name = "pfpr_mic_2_10",
population_raster = pop_rast,
adm1_sf = adm1, adm2_sf = adm2
)
fit$cell_predictions$mean
fit$admin$adm2
# Pipeline-style call with disk + cache
fit2 <- fit_mbg_indicator(
cluster_data = pfpr_dt,
indicator_name = "pfpr_mic_2_10",
population_raster = pop_rast,
adm1_sf = adm1, adm2_sf = adm2, adm3_sf = adm3,
primary_level = "adm3",
output_levels = c("adm1", "adm2", "adm3"),
survey_year = 2017,
source_label = "MIS",
output_dir = "outputs/mbg_fit"
)
# country_iso3 / country_iso2 / dhs_code are derived from cluster coords.
} # }