Orchestrates the full MBG processing pipeline for DHS indicators across
one or more surveys in AHADI's parquet archive. This is a multi-survey,
multi-indicator production workflow - it is not a general-purpose
wrapper around fit_mbg_indicator().
Usage
run_mbg_pipeline(
country_iso3,
adm0_sf,
adm1_sf,
adm2_sf,
adm3_sf = NULL,
pop_raster,
path_dhs_parquet,
table_out_path,
raster_out_path,
intermediate_out_path,
pop_raster_u5 = NULL,
pop_raster_1_2 = NULL,
pop_raster_5_10 = NULL,
pop_raster_10_20 = NULL,
pop_raster_20plus = NULL,
pop_raster_wra = NULL,
country_iso2 = NULL,
survey_year = NULL,
min_year = "2005",
survey_type = NULL,
indicators = c("pfpr", "itn", "irs", "anc", "csb", "anemia", "iptp", "epi", "fever",
"antimalarial"),
aggregation_level = c("adm2", "adm3"),
run_mbg = TRUE,
save_rasters = TRUE,
save_cluster_points = TRUE,
cluster_points_out_path = NULL,
cache = TRUE,
csb_priority_method = c("all", "first", "public", "private"),
custom_csb_indicator = NULL,
verbose = TRUE,
debug = FALSE
)Arguments
- country_iso3
Three-letter ISO country code (e.g., "bdi").
- adm0_sf
sf object with country boundary.
- adm1_sf
sf object with ADM1 boundaries.
- adm2_sf
sf object with ADM2 boundaries.
- adm3_sf
sf object with ADM3 boundaries (optional). Required when
aggregation_level = "adm3". Should contain columns for adm3 names and parent adm2/adm1/adm0 linkages.- pop_raster
Total population raster(s). Can be:
Named list with years as names and file paths as values:
list("2019" = "path/to/pop_2019.tif", "2020" = "path/to/pop_2020.tif")Single file path (used for all years)
Already-loaded SpatRaster object (used for all years)
- path_dhs_parquet
Path to DHS parquet archive.
- table_out_path
Output directory for tables (CSV, XLSX).
- raster_out_path
Output directory for prediction rasters.
- intermediate_out_path
Output directory for cached intermediate outputs (aggregation tables, ID rasters).
- pop_raster_u5
Under-5 population raster(s) (optional). Same format as
pop_raster. Used for indicators targeting children 0-59 months (PfPR, fever, CSB, ACT, antimalarial, anemia, U5MR, SMC, ITN use U5). If NULL, falls back topop_raster.- pop_raster_1_2
Population raster for children 12-23 months (optional). Same format as
pop_raster. Used for EPI (immunization) indicators. Falls back topop_raster_u5, thenpop_rasterif NULL.- pop_raster_5_10
Population raster for ages 5-10 (optional). Same format as
pop_raster. Used foruse_itn_5_10. Falls back topop_rasterif NULL.- pop_raster_10_20
Population raster for ages 10-20 (optional). Same format as
pop_raster. Used foruse_itn_10_20. Falls back topop_rasterif NULL.- pop_raster_20plus
Population raster for ages 20+ (optional). Same format as
pop_raster. Used foruse_itn_20plus. Falls back topop_rasterif NULL.- pop_raster_wra
Women of reproductive age (15-49) population raster (optional). Same format as
pop_raster. Used for ANC, IPTp, and ITN use among pregnant women. Falls back topop_rasterif NULL.- country_iso2
Two-letter DHS country code (e.g., "BU"). If NULL (default), derived automatically from
country_iso3using thecountrycodepackage.- survey_year
Survey year(s) to process. Can be:
NULL: Process ALL available surveys with GPS data
Single integer: Process only that year (e.g., 2016)
Integer vector: Process specific years (e.g., c(2012, 2016))
- min_year
Minimum survey year to include. Surveys before this year will be excluded. Useful for filtering out older surveys that may lack key indicators. Default: NULL (no minimum).
- survey_type
Survey type(s) to process. Can be:
NULL (default): Auto-detect all available survey types (DHS, MIS, etc.)
Single string: Process only that type (e.g., "DHS")
Character vector: Process specific types (e.g., c("DHS", "MIS"))
- indicators
Character vector of indicator categories to process:
"pfpr": Parasite prevalence
"itn": ITN ownership/access/use
"irs": IRS coverage
"anc": ANC attendance
"csb": Care-seeking behavior
"act": ACT treatment (case management)
"anemia": Anemia prevalence
"iptp": IPTp doses
"epi": EPI vaccination
"u5mr": Under-5 mortality
"smc": SMC receipt
"fever": Fever prevalence (U5)
"antimalarial": Any antimalarial treatment (febrile U5)
"wealth": Wealth quintile distribution (proportion in poorest/richest)
"eff_cm": Effective coverage of case management (derived; auto-adds "csb" and "act" as dependencies)
- aggregation_level
Primary aggregation level for MBG outputs. One of:
"adm2": Aggregate to ADM2 level (default)
"adm3": Aggregate to ADM3 level (requires
adm3_sf)
- run_mbg
Logical. If TRUE, runs MBG models. Default: TRUE.
- save_rasters
Logical. If TRUE, saves output rasters. Default: TRUE.
- save_cluster_points
Logical. If TRUE (default), saves the coordinate-level (cluster-point) datasets produced by
calc_*_mbg()to disk in addition to the aggregated admin-level outputs. Set to FALSE to skip saving cluster-point files.- cluster_points_out_path
Output directory for cluster-point (coordinate-level) datasets. If NULL (default), falls back to
table_out_path. Use this to keep the per-cluster point files in a separate location from the combined admin-level outputs.- cache
Logical. If TRUE (default), reuses cached intermediate outputs (aggregation tables, ID rasters) when available. Set to FALSE to force regeneration of all intermediate outputs.
- csb_priority_method
Character, one of "all" (default), "first", "public", or "private". Controls how overlapping care-seeking records are resolved in the CSB (and wealth-stratified CSB) indicators so each individual is assigned to at most one sector:
"all": Keep WHO methodology; overlaps allowed (csb_public and csb_private can both be 1 for the same child).
"first": Take the first recurring h32 source visited per child (alphabetical h32 order: h32a, h32b, ..., h32x). Mutually exclusive.
"public": Public-sector priority - if any public/CHW care, classify as public; otherwise private if any private; otherwise none.
"private": Private-sector priority - if any private care, classify as private; otherwise public if any public; otherwise none.
With non-"all" values, csb_public + csb_private + csb_none sums to 100%.
- custom_csb_indicator
Optional named list defining a user-specified, mutually-exclusive care-seeking partition fitted in addition to the built-in CSB family. When supplied, three derived indicators are produced:
<name>_dhis(sought care at any user-listed DHIS source),<name>_nondhis(sought care at any user-listed non-DHIS source and not at any DHIS source), and<name>_untreat(did not seek care at any user-listed source). The list must have four character fields:name: A short alphanumeric prefix used to build the three derived indicator codes (e.g."foo"->foo_dhis,foo_nondhis,foo_untreat).dhis_locs: Character vector of h32 entries that map to the DHIS bucket. Each entry may be either an h32 variable name (e.g."h32a") or a haven label string (case-insensitive). The two styles can be mixed in the same vector; variable-name matches take precedence over label matches when an h32 column is referenced both ways.nondhis_locs: Character vector for the non-DHIS bucket (same dual-style semantics asdhis_locs).untreat_locs: Character vector for the untreated bucket (same dual-style semantics asdhis_locs).
When
custom_csb_indicatoris supplied, the partition is activated automatically: thenameis added toindicatorsunless it (or one of the three derived codes) is already listed. The custom triple is always mutually exclusive at the child level (priority dhis > nondhis > untreat); admin-level estimates are rescaled to sum to 100% per admin unit. Default: NULL (disabled).- verbose
Logical. If TRUE, prints detailed progress. Default: TRUE.
- debug
Logical. If TRUE, prints additional diagnostic messages for troubleshooting. Default: FALSE.
Value
A list containing:
final_dataset: Named list with
adm0,adm1, andadm2(oradm3) tibbles in long format. Each tibble has standard columns: survey_id, iso3, iso2, survey_type, survey_year, adm0, adm1, adm2, type, geo_source, point, ci_l, ci_u, numerator, denominator, survey_numerator, survey_denominator, n_survey_clusters, indicator, indicator_code, numerator_description, denominator_description.mbg_estimates: MBG predictions at the specified aggregation level (if MBG was run)
cluster_data: Raw cluster-level data
raster_paths: Paths to saved rasters
survey_metadata: Survey collection dates and metadata
Details
See indicator-specific methodology files at https://github.com/ahadi-analytics/sntmethods/tree/master/inst/methods
Pipeline steps:
Discovers available surveys
Loads survey data
Prepares cluster-level data for each indicator
Runs MBG models (if enabled)
Generates outputs (rasters, CSVs, maps)
When to use this
Use run_mbg_pipeline() when:
You have a hive-partitioned DHS parquet archive laid out as documented in
dhs_read()- the pipeline callsdhs_read()internally to discover and load surveys.You want to fit many indicators across many surveys end-to-end and have them returned as long-format tables stacked at
adm0/adm1/adm2, with prediction rasters cached to disk.You have set up the optional INLA +
mbgstack (see the Get started vignette).
When not to use this
Do not use run_mbg_pipeline() for ad-hoc, single-survey, or
single-indicator work. The pipeline assumes the parquet archive layout
from dhs_read() and is not designed to be pointed at one DHS file. For
those cases:
One indicator, one survey, your own data. Use
fit_mbg_indicator()directly. It takes a cluster-level data.table (produced bysntutils::read()+ aprep_*_mbg()helper, or built by hand) and runs a single INLA / MBG fit. No archive required.kr <- sntutils::read("TGKR81FL.DTA") ge <- sntutils::read("TGGE8AFL.dta") cl <- calc_pfpr_mbg(dhs_pr = kr, gps_data = ge)$pfpr_rdt_u5 fit <- fit_mbg_indicator(input_data = cl, ...)Survey-weighted DHS indicators without spatial modelling. Use the
calc_*_dhs()family - they accept any data frame and need no parquet archive and no INLA stack.You only need one country, one survey, but want the pipeline shape. Build a minimal one-survey parquet leaf following the directory layout in
dhs_read()and then callrun_mbg_pipeline()against it; this is usually only worth the setup if you plan to add more surveys later.
Examples
if (FALSE) { # \dontrun{
results <- run_mbg_pipeline(
country_iso3 = "bdi",
adm0_sf = adm0,
adm1_sf = adm1,
adm2_sf = adm2,
pop_raster = list("2016" = "/path/to/bdi_ppp_2016.tif"),
path_dhs_parquet = "path/to/parquet",
table_out_path = "path/to/output/tables",
raster_out_path = "path/to/output/rasters",
intermediate_out_path = "path/to/output/intermediate",
survey_year = 2016,
indicators = c("pfpr", "itn", "csb"),
cache = TRUE
)
} # }
