Every SNT analysis ends up turning a folder of .tifs
into a tidy tibble keyed by admin and time. sntutils
exposes a small set of batch raster processors that
handle the boring parts: detecting dates from filenames, aligning CRS,
cropping to the shapefile extent, computing zonal stats, and applying
population weights when needed.
The same processors work for any raster source - the downloads articles (Climate, WorldPop) deposit files on disk, then these functions reduce them to admin-level tibbles.
For the methodology and conceptual background behind the steps in this article, please check the SNT Code Library:
- Extracting climate from rasters - the analytical step-by-step.
- Population rasters - WorldPop methodology.
- MAP outputs - Malaria Atlas Project rasters and how to use them in SNT.
- IHME mortality rasters - U5M raster methodology.
- Modeled data overview - when to lean on modeled vs measured surfaces.
Basic zonal stats
The default for CHIRPS / ERA5-monthly / MODIS-monthly. Dates are detected automatically from filenames; the function aligns CRS between raster and shapefile, crops the raster to the shapefile extent before extraction, and returns a tibble of admin x time x stat.
library(sntutils)
adm3_shp <- sf::st_read(system.file(
"extdata", "sle_adm3_example.geojson", package = "sntutils"
))
raster_dir <- system.file("extdata", "chirps_test_rasters",
package = "sntutils")
rainfall_df <- process_raster_collection(
directory = raster_dir,
shapefile = adm3_shp,
id_cols = c("adm0", "adm1", "adm2", "adm3"),
aggregations = c("mean"),
pattern = "\\.tif$"
)
rainfall_df |> utils::head()
#> file_name adm1 adm2 adm3 year month mean
#> 1 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN DEA 2020 1 12.871692
#> 2 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN JAHN 2020 1 9.820749
#> 3 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN JAWIE 2020 1 12.042542
#> 4 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN KISSI KAMA 2020 1 8.951293Pass aggregations = c("mean", "sum", "median", "max") to
get several stats in one pass.
Single raster, multiple boundaries
Use this when the same raster needs to be extracted to several shapefiles (adm1, adm2, adm3 simultaneously), or with different ID column sets.
process_raster_with_boundaries(
raster_file = "01_data/1.5_environment/1.5a_climate/raw/chirps/chirps-v2.0.2020.01.tif",
boundaries = list(
adm1 = sle_adm1_clean,
adm2 = sle_adm2_clean,
adm3 = sle_adm3_clean
),
id_cols = c("adm0_name", "adm1_name", "adm2_name", "adm3_name"),
aggregations = c("mean", "sum")
)Time-varying boundaries
When the boundaries themselves change over time (post-redistricting in Sierra Leone 2017, Togo 2019, recent DRC changes), this wrapper aligns each raster’s year to the appropriate shapefile vintage.
process_rasters_by_year(
directory = "01_data/1.5_environment/1.5a_climate/raw/chirps",
shapefiles = list(
`<=2017` = sle_adm2_pre_2017,
`>=2018` = sle_adm2_post_2017
),
id_cols = c("adm0_name", "adm1_name", "adm2_name"),
aggregations = "mean"
)Each raster is matched to the shapefile vintage whose year range it
falls into, then extracted normally. The output tibble carries an
explicit boundary_vintage column so downstream analyses
know which shapefile a row came from.
Population-weighted extraction
For variables that need population weighting (e.g. mean temperature weighted by population so we report an “experienced” value rather than an areal average), point the weighted extractors at a population raster of the same extent. Weights are applied per pixel during extraction.
Population-weighted collection
process_weighted_raster_collection(
directory = "01_data/1.5_environment/1.5a_climate/raw/chirps",
shapefile = sle_adm2_clean,
weight_raster = "01_data/1.1_foundational/1.1c_population/1.1cii_worldpop_rasters/sle_ppp_2020_1km.tif",
id_cols = c("adm0_name", "adm1_name", "adm2_name"),
aggregations = c("mean", "sum")
)Population-weighted stacks
The same idea but for raster stacks (multiple bands per file) - useful when WorldPop publishes a multi-year stack or an ERA5 NetCDF has many variables in one file.
Normalising weights within a polygon
The building block under the weighted extractors. Normalises a raster’s values so they sum to 1 within each polygon - which is what lets the weighted aggregations keep totals consistent across resolutions.
norm_pop <- normalize_raster_by_polygon(
raster = terra::rast("sle_ppp_2020_1km.tif"),
shp = sle_adm2_clean,
id_col = "adm2_name"
)IHME under-5 mortality
process_ihme_u5m_raster() is a domain-specific helper
that turns IHME’s under-5 mortality rasters into admin-level tibbles in
one call, applying the same crop / extract pattern as the generic
processors:
u5m_adm2 <- process_ihme_u5m_raster(
directory = "01_data/1.2_epidemiology/1.2c_mortality_estimates/raw/ihme_u5m",
shapefile = sle_adm2_clean,
id_cols = c("adm0_name", "adm1_name", "adm2_name")
)Filename utilities
The batch processors detect dates from filenames using name patterns. These utilities are exposed for ad-hoc inspection and for renaming external archives to a format the processors understand:
| Function | What it does |
|---|---|
detect_time_pattern() |
Inspect a list of filenames and report the date pattern detected
(YYYY, YYYY-MM, YYYYMM, …). |
extract_time_components() |
Pull year / month / day from a single filename given the pattern. |
clean_filenames() |
Strip spaces, accents and punctuation from a directory of filenames so they’re processor-friendly. |
tidy_malaria_raster_names() |
Batch-rename a directory of MAP / IHME rasters to the SNT convention. |
# what date pattern do these CHIRPS files use?
detect_time_pattern(list.files("01_data/1.5_environment/1.5a_climate/raw/chirps"))
#> "YYYY.MM"
# tidy up MAP downloads so process_raster_collection() can detect dates
tidy_malaria_raster_names("01_data/1.5_environment/1.5b_accessibility/raw/map")An extraction pipeline, end to end
# 1. download climate (see the Climate article)
download_chirps(
dataset = "africa_monthly",
start = "2020-01", end = "2024-12",
out_dir = "01_data/1.5_environment/1.5a_climate/raw/chirps"
)
# 2. download population (see the WorldPop article)
download_worldpop(
country_codes = "SLE",
years = 2020:2024,
resolution = "1km",
dest_dir = "01_data/1.1_foundational/1.1c_population/1.1cii_worldpop_rasters"
)
# 3. unweighted population sum by adm2-year - sum the WorldPop pixels
pop_adm2 <- process_raster_collection(
directory = "01_data/1.1_foundational/1.1c_population/1.1cii_worldpop_rasters",
shapefile = sle_adm2_clean,
id_cols = c("adm0_name", "adm1_name", "adm2_name"),
aggregations = "sum",
pattern = "_ppp_\\d{4}_1km.*\\.tif$"
)
# 4. population-weighted mean rainfall by adm2-month
rain_adm2 <- process_weighted_raster_collection(
directory = "01_data/1.5_environment/1.5a_climate/raw/chirps",
shapefile = sle_adm2_clean,
weight_raster = "01_data/1.1_foundational/1.1c_population/1.1cii_worldpop_rasters/sle_ppp_2020_1km.tif",
id_cols = c("adm0_name", "adm1_name", "adm2_name"),
aggregations = "mean"
)
# 5. join - one tibble keyed by admin and year
analysis_ready <- rain_adm2 |>
dplyr::left_join(pop_adm2, by = c("adm0_name", "adm1_name", "adm2_name", "year"))Two archives (climate + population), one tibble keyed by admin and date - the canonical input for stratification and modelling.
Want to map the extracted tibble as a small-area surface? See the Climate article’s mapping section for a worked example using
facetted_map_gradient()and theget_palette()colour helpers. The same recipe works for any raster-extracted tibble.
