Skip to contents

sntutils is an R package developed by AHADI to support the Subnational Tailoring (SNT) of malaria interventions. It bundles the small, repeated operations every SNT support analysis does - reading DHIS2 exports, harmonising admin names across shapefile vintages, validating facility coordinates, calculating reporting rates, extracting climate and population rasters to admin units, and rendering country-team-language plots and maps.

Full documentation, with worked examples for every workflow, lives at https://ahadi-analytics.github.io/sntutils/

The pkgdown site is the canonical reference. This README is a short orientation and an index of the package’s surface area.

Install

# install pak if needed
install.packages("pak")

# install sntutils from GitHub
pak::pkg_install("ahadi-analytics/sntutils")

System dependencies: sntutils uses sf and terra, which require GDAL, GEOS and PROJ. On macOS install with brew install gdal proj geos; on Ubuntu use the GDAL PPA.

Quick start

library(sntutils)

# 1. read a DHIS2-style export (any common format works)
sl_dhis2 <- read(
  system.file("extdata", "sl_exmaple_dhis2.rds", package = "sntutils")
)

# 2. clean it
sl_dhis2 <- sl_dhis2 |>
  standardize_names() |>
  autoparse_dates(date_cols = "date") |>
  dplyr::rename(year_mon = date) |>
  dplyr::mutate(
    hf_uid = vdigest(paste0(adm1, adm2, hf), algo = "xxhash32")
  )

# 3. calculate district-month reporting rates
calculate_reporting_metrics(
  data             = sl_dhis2,
  vars_of_interest = c("conf", "pres"),
  x_var            = "year_mon",
  y_var            = "adm2",
  hf_col           = "hf_uid",
  key_indicators   = c("allout", "test", "treat", "conf", "pres")
)

The Get started article walks through the same example end-to-end with plotted output.

What’s inside

The package exports ~100 functions, grouped by workflow stage. Each row links to the article that covers that group in depth.

Read & clean →

Read and write any common SNT format, parse messy dates, infer column types, standardise admin and facility names, build data dictionaries.

Function What it does
read(), write() Read / write CSV, Excel, Stata, SPSS, RDS, GeoJSON, shapefile, …
read_snt_data(), write_snt_data() Atomic, hashed reads / writes with sidecar metadata
autoparse_dates(), available_date_formats Detect and standardise mixed date formats
auto_parse_types(), detect_factors() Infer numeric / integer / factor / date types
standardize_names(), clean_filenames() Tidy admin / facility names and file names
prep_geonames() Interactive admin-name harmonisation with caching
build_dictionary(), snt_data_dict(), check_snt_var() Variable dictionaries

Spatial →

Validate admin geometries and facility coordinates, crosswalk between shapefile vintages, fuzzy-match facilities, render maps.

Function What it does
download_shapefile() Pull WHO geohub boundaries by ISO3 and admin level
validate_process_spatial() Validate / repair admin shapefiles
validate_process_coordinates() Validate facility lat / lon, drop low-precision rows
crosswalk_shapefiles_sf() Area-weighted overlap between two shapefile vintages
fuzzy_match_facilities(), calculate_match_stats() Match DHIS2 facilities to the master list
dhis2_map() Rename DHIS2 columns via a dictionary
plot_admin_map_distinct(), facetted_map_bins(), facetted_map_gradient() Categorical and continuous admin maps
get_palette(), list_palettes() AHADI-branded plot palettes

Reporting rates →

Measure how completely facilities are reporting, by time and admin unit.

Function What it does
calculate_reporting_metrics() Reporting / missing rate; three scenarios (facility, two-dim, time-only)
calculate_reporting_metrics_dates() Reporting rate from open / close dates
reporting_rate_plot(), reporting_rate_map() Plot and map reporting completeness
classify_facility_activity(), get_active_facilities(), facility_reporting_plot() Per-facility activity status and timelines
compare_methods_plot() Compare two reporting-rule choices side by side
validate_routine_hf_data() Structural checks before any of the above

Data quality →

Cascade consistency, outlier detection (3 methods), correction and imputation.

Function What it does
consistency_check(), consistency_map() Flag cascade violations (e.g. tests < confirmed cases)
detect_outliers(), outlier_plot() Detect outliers with mean / median / IQR rules
correct_outliers(), impute_outlier_ma(), impute_higher_admin() Replace flagged values
fallback_diff(), fallback_row_sum(), safe_sum() Defensive numerical helpers

Climate →

CHIRPS, ERA5, MODIS, NASA POWER download wrappers.

Function What it does
download_chirps(), check_chirps_available(), chirps_options() CHIRPS monthly rainfall
download_era5(), check_era5_available(), era5_options(), read_era5(), get_era5_metadata(), print_era5_metadata(), migrate_era5_filenames() ERA5 reanalysis
download_modis(), modis_options() MODIS land-surface variables
download_process_nasapower() NASA POWER agro-climate at points

WorldPop →

WorldPop downloads (totals, age bands, urbanicity, global mosaic), extrapolation and SNT-shape reshape.

Function What it does
download_worldpop() Total population, legacy + R2025A, 1 km / 100 m, count / density
download_worldpop_age_band() Population for a specified age range and sex
download_worldpop_urbanicity() Urban / peri-urban / rural classification
get_worldpop_paths() Resolve where downloaded files live
extrapolate_pop() Fill years between or beyond observed years
snt_process_population() Reshape into canonical SNT long format

Rasters →

Batch raster processors that turn any raster archive (climate, population, MAP, IHME) into admin-keyed tibbles.

Function What it does
process_raster_collection(), process_raster_with_boundaries(), process_rasters_by_year() Batch zonal stats; time-varying boundaries
process_weighted_raster_collection(), process_weighted_raster_stacks(), normalize_raster_by_polygon() Population-weighted extraction
process_ihme_u5m_raster() IHME under-5 mortality rasters to admin tibble
tidy_malaria_raster_names(), detect_time_pattern(), extract_time_components(), clean_filenames() Raster-naming utilities

DHS →

Discover, download and query DHS / MIS indicators via the API; open DHS parquet microdata via DuckDB.

Function What it does
check_dhs_indicators() DHS API indicator catalogue
download_dhs_indicators() National / subnational indicator values
get_dhs_data() Register DHS parquet datasets as DuckDB views

Project utilities →

Folder scaffolding, paths, translation, hashing, image compression, small numeric helpers.

Function What it does
setup_project_paths(), ahadi_path() Resolve standardised project paths
create_data_structure(), initialize_project_structure() Build the AHADI folder skeleton
clear_snt_cache() Reset in-memory caches
translate_text(), translate_text_vec(), translate_yearmon(), french_malaria_acronyms() Cached translation and locale-aware formatting
compress_png() PNG compression for reports
vdigest() Vectorised hashing for stable IDs
big_mark(), sum2(), mean2(), median2() NA-safe numeric helpers
get_model(), generate_ir_plot(), run_resistance_trend() IR / resistance-trend plotting
auto_bin(), prepare_plot_data(), get_pathway_vars() Internal plot building blocks

EMOD simulation inputs

Used by AHADI’s malaria modelling team to feed EMOD-style simulations from SNT data.

Function What it does
build_emod_demog(), build_emod_demog_from_wpp() Build demographic JSON
write_emod_demog_by_adm2() Per-adm2 demography files
read_emod_weather(), write_emod_weather(), write_emod_weather_by_adm2() Weather input files

Where to start

  1. Read Get started for a 5-minute tour and a tiny end-to-end example.
  2. Pick the workflow stage you’re working on from the Articles menu on the site.
  3. Use the Reference menu when you need the full argument list for a single function.

Contributing

Issues and pull requests welcome: https://github.com/ahadi-analytics/sntutils/issues.

Each function should ship with roxygen docs (including a runnable @examples block) and a test in tests/testthat/.

License

CC BY 4.0.