WorldPop population raster
Overview
While national population data is the primary source for SNT, high-resolution modeled datasets like WorldPop can provide additional value—especially when working at fine spatial scales (e.g., 100m or 1km grids). These gridded estimates are useful for specific tasks such as estimating access to care or mapping population distribution within large or heterogeneous admin units.
WorldPop may be useful when a) we need pixel-level estimates to model access or calculate catchment populations; b) we’re estimating target groups like under-fives or women of reproductive age; c) admin population data does not align well with our spatial boundaries; or d) recent national projections are unavailable, and the SNT team has cleared the use of modeled alternatives.
- Understand how raster-based population datasets work
- Download and import WorldPop .tif files directly
- Use shapefiles to extract population totals at the admin unit level
- Align CRS between raster and vector layers before extraction
- Apply annual growth rates to project population beyond 2020
- Reshape population data to long format for easier integration
- Validate extracted results visually using population maps
- Explore other use cases of raster population data for SNT analysis
- Normalize raster values within each unit to support spatial weighting
- Save processed and generated outputs for reuse
When to Use Population Rasters
High-resolution population rasters are not just stand-alone counts—they can be integrated directly into modeling, weighting, and allocation tasks. Estimates from population rasters like WorldPop can support more than just subnational population counts:
Proportional Allocation: When official data exists at higher levels (e.g.,
adm2) but lower-level detail (e.g.,adm3) is missing or outdated, WorldPop can provide population shares to distribute known totals spatially.Population Weighting: In some analyses, indicators such as PfPR2-10, ITN access, or care-seeking may be modeled at fine spatial scales (e.g., 1 km raster). To summarize these to the district level, population-weighted means are preferable to simple averages—ensuring areas with more people contribute more to the aggregated value. WorldPop can be used to supply the population weights for this purpose.
Catchment & Access Modeling: Combine population and travel-time rasters to estimate the % of people within or beyond specific access thresholds (e.g., 5 km from health facilities), and then apply these shares to census totals.
These use cases treat WorldPop as a tool for distribution and spatial inference, not as a substitute for census counts.
WorldPop and other modeled datasets can offer high-resolution population estimates, especially useful for analyses like access-to-care or catchment modeling. However, these are not official sources and must not be used unless approved by the SNT team.
- The SNT team will advise if and when modeled data can be used in your setting.
- Never substitute these datasets for census-based figures unless explicitly directed.
- If you’re unsure whether use is appropriate, always consult the SNT team first.
This ensures population estimates remain credible, aligned with national policy, and comparable across geographies.
In this section, we’ll walk through how to source WorldPop data, extract values using shapefiles, and get everything cleaned and aggregated for your SNT analysis.
How Rasters Work
Raster datasets work differently from spreadsheets. Instead of rows and columns of text or numbers, a raster is like a matrix of pixels. Each pixel has a value—just like each cell in Excel—but instead of being part of a table, it’s part of a spatial grid. That grid is tied to real-world geographic coordinates. As an example, below is a population density raster map of Rwanda using 2020 UN-adjusted estimates.
Figure: WorldPop-estimated population density in Rwanda (2020). Source: WorldPop, UN-adjusted 1km raster.
For WorldPop, each pixel represents the estimated number of people living in a small area—usually either 100m × 100m or 1km × 1km. This lets us estimate population density at very fine resolution, and then aggregate upward into admin units.
WorldPop rasters are typically stored as .tif files (GeoTIFF format). Each file usually represents one country, one year, and one population group (e.g., total population, under-fives, WRA, etc.)
The units for each pixel are people per pixel (ppp), not people per square kilometer. If you’re using a 100m raster, a value of 25 means 25 people in that 100m × 100m square.
These values are modeled—using machine learning and spatial covariates (like land use, road networks, and night-time lights)—and then constrained using census data where available. So, while they give good spatial resolution, they’re still estimates, and may not match official figures exactly.
Here’s what to expect when you open a raster:
| Layer Name | What It Represents |
|---|---|
sle_ppp_2020_UNadj.tif |
Sierra Leone, total population, 2020 |
nga_f_0_4_2021_UNadj.tif |
Nigeria, females aged 0–4, 2021 |
eth_m_15_49_2019_UNadj.tif |
Ethiopia, males aged 15–49, 2019 |
Why does this matter for SNT?
Because raster data doesn’t come organized into districts or provinces—we have to extract values using shapefiles. That’s where the spatial analysis comes in. We overlay our admin boundaries (e.g., adm3 shapefile), and sum the values of all the pixels that fall within each unit. This gives us a population estimate by district, chiefdom, or catchment.
Step-by-Step
To skip the step-by-step explanation, jump to the full code at the end of this page.
Step 1: Import Raster
In this section, we will work with WorldPop raster datasets. We first need to download the latest population raster for our country of interest. We can explore available rasters by year and country at the WorldPop Data Hub. We use the terra package to directly import the raster into R from the WorldPop server.
# Install or load required packages
pacman::p_load(
httr, # API access (e.g., MAP, WorldPop)
sf, # vector data (shapefiles)
scales, # label formatting
patchwork, # combine ggplots
terra, # raster handling
exactextractr # zonal stats from rasters
)
# Define direct link to 2020 Sierra Leone population raster
sle_pop_path <- paste0(
"https://data.worldpop.org/GIS/Population_Density/",
"Global_2000_2020_1km_UNadj/2020/SLE/sle_pd_2020_1km_UNadj.tif"
)
# Import raster
sle_2020_rast <- terra::rast(sle_pop_path)To adapt the code:
- Line 10-11: Replace the URL with the link to your country’s raster file or specify folder path if you already have the raster locally.
Once updated, run the code to read the raster file locally.
Step 2: Extract Population Values by Admin Unit
Once we have our population raster and admin shapefile, the next step is to extract population totals for each admin unit.
It’s best to use a shapefile that matches the level of analysis—for example, if your SNT analysis is at adm3, then extract population at adm3. Avoid using higher-level shapefiles (e.g., adm1) and trying to disaggregate downward. This often introduces errors or unrealistic assumptions.
While extracting from a lower level (e.g., adm3) can offer flexibility to aggregate upward, this is only useful if the lower-level shapefile is complete and reliable. In practice, some countries have poor coverage or inconsistencies at very fine levels like adm3 or adm4. So unless there’s a specific reason to go lower, match your shapefile to the analysis level.
Step 2.1: Get shapefile and set CRS
# Get shapefile
shp_adm3 <- sf::read_sf(
here::here(
"1.1_foundational",
"1.1a_administrative_boundaries",
"Chiefdom2021.shp"
)
) |>
dplyr::mutate(adm0 = "SIERRA LEONE") |>
dplyr::select(
adm0,
adm1 = FIRST_REGI,
adm2 = FIRST_DNAM,
adm3 = FIRST_CHIE
)
# Ensure both layers use same CRS
shp_adm3 <- sf::st_transform(shp_adm3, crs = terra::crs(sle_2020_rast))To adapt the code:
- Line 5-7: Replace the path in the
sf::read_sffunction with the local path where your shapefile data is stored. - Line 10-15: These are optional steps for creating the
adm0column and selecting or renaming relevant admin columns. This will vary depending on your shapefile, so adapt accordingly.
Once updated, run the code to load your shapefile.
In the code above, we first download the adm3-level shapefile, which contains the boundaries we want to use for summarizing the population data. However, spatial operations require that both the raster and vector layers share the same coordinate reference system (CRS). We handle that using sf::st_transform() to align the shapefile to the CRS used by the raster.
Before extracting population values, always make sure your raster and shapefile are in the same Coordinate Reference System (CRS).
Misaligned projections can cause extraction errors, incorrect population counts, or features being shifted outside expected boundaries.
Population rasters like WorldPop are usually provided in WGS84
(EPSG:4326), but your shapefiles might use different systems like UTM or country-specific projections.Always check and reproject if needed to avoid conflicting projections.
You can check the CRS of your layers in R:
If they differ, reproject your shapefile to match the raster:
Aligning projections ensures the extraction overlays are accurate and reproducible. It is always a good practice to always do this at the start.
Step 2.2: Extract population from raster
We use the exact_extract() function from the exactextractr package. This is well-suited for our needs because:
It calculates area-weighted summaries, which is important when raster cells only partially overlap admin boundaries. This ensures more accurate totals, especially around irregular borders.
It’s faster and more memory-efficient than alternatives like
raster::extract()orterra::extract(), as it’s built on a C++ backend for performance.It respects geometry boundaries precisely and avoids double-counting, giving clean, reliable results that scale well to national-level rasters.
To adapt the code:
- Line 3: Replace
sle_2020_rastwith your raster object andshp_adm3with your shapefile object - Lines 9: Adjust the columns in
dplyr::select()to match your admin level columns (if named differently).
Once updated, run the code to extract population from raster.
This step creates a new column, pop, containing the total estimated population for each adm3 unit in 2020, extracted from the WorldPop raster. We also add a year column set to 2020 to clearly mark the time reference. These values can now be treated like census-based totals—ready to be aggregated, visualized, or integrated into the broader SNT analysis pipeline.
Step 3: Aggregating Population Data
In this step, we aggregate the population counts to admin level 2 (adm2). This produces summarized population estimates by district or equivalent units, which can be used in later analyses or mapping exercises.
To adapt the code:
- Line 4: Update
adm1andadm2to match the admin levels you are working with (e.g., region, district, etc.), based on your shapefile attributes.
Once updated, run the code to load aggregate population data.
Step 4: Extend Population Estimates Beyond 2020
WorldPop raster data is only available up to 2020 in most countries. This presents a challenge if population estimates are needed for more recent years.
Step 4.1: Prepare for interpolation
To prepare for interpolation, we first convert the dataset from long to wide format. This involves removing the year column and renaming the extracted pop column to pop2020. We then use a fixed annual growth rate (e.g., 1.5%) to project population totals for subsequent years. This structure makes it easy to apply year-on-year calculations and is suitable when raster data is only available for a single base year.
If population growth rates are needed to project values beyond the available data year (e.g., for extrapolating WorldPop estimates), the appropriate rate must come from the national statistics bureau (or equivalent). Do not apply assumed or ad hoc growth rates.
- The SNT team is responsible for sourcing and approving any growth parameters.
- Always consult the SNT team before using projections in your analysis.
This ensures population estimates remain technically sound, credible, and aligned with national guidance.
# apply population growth rate for
# the next three years
sl_pop_2020_2023 <- shp_adm3 |>
sf::st_drop_geometry() |>
dplyr::select(-year) |>
dplyr::rename(pop2020 = pop) |>
dplyr::mutate(
pop2021 = round(pop2020 * 1.015),
pop2022 = round(pop2021 * 1.015),
pop2023 = round(pop2022 * 1.015)
)
# check results
sl_pop_2020_2023To adapt the code:
- Line 3: Replace
shp_adm3with your shapefile object, if named differently. - Line 8-10: Add more rows following the same pattern if you need to extend projections beyond 2023 (e.g., add
pop2024,pop2025, etc.). Update the year labels if you are projecting for different years or starting from a different baseline year. - Line 8-10: Update the growth rate (1.015) if your context requires a different annual population growth rate.
Once updated, run the code to interpolate population data.
Step 4.2: Reshape to long format
With interpolated values prepared in wide format, we now convert the dataset back to long format. This ensures each row corresponds to one admin unit for one year—making it easier to merge with other datasets, generate indicators, or plot trends over time.
To adapt the code:
- Line 3: If you renamed the columns differently (not starting with
pop), updatedplyr::starts_with("pop")to match your actual prefix. - Line 8-10: If you used a different naming pattern (e.g.,
population2020), adjuststringr::str_remove("pop")to match that pattern (e.g.,stringr::str_remove("population")).
Once updated, run the code to reshape your interpolated population data.
Step 5: Visualizing Raster-Extracted Population Data
After extracting and projecting population estimates from the raster, it’s helpful to visualize the results. This lets us confirm that spatial joins worked correctly and check for anomalies—such as missing areas, unrealistic values, or gaps in coverage—before moving forward.
Step 5.1: Join and bin population data
We start by joining the long-format population dataset to the corresponding shapefile geometry. We also group population values into bins to simplify interpretation across the map. Grouping helps highlight population ranges and quickly spot unusually high or low values. To support quality checks, we group population values into bins. We also include a “Missing or 0” category to catch any admin units where the value is either absent or zero. This category is rendered in grey on the map, allowing us to quickly spot potential gaps in the raster extraction or projection steps.
# join extracted raster pop to shapefile
shp_adm3_raster_pop <- sl_pop_long |>
dplyr::left_join(
# join shapefile to pop data
dplyr::select(shp_adm3, -pop, -year),
by = c("adm0", "adm1", "adm2", "adm3")
) |> sf::st_as_sf() |>
# create categorical version of pop column
dplyr::mutate(
pop_bin = dplyr::case_when(
is.na(pop) | pop == 0 ~ "Missing or 0",
pop <= 15000 ~ "<15k",
pop <= 30000 ~ "15k–30k",
pop <= 50000 ~ "30k–50k",
pop <= 100000 ~ "50k–100k",
pop > 100000 ~ "100k+"
),
pop_bin = factor(
pop_bin,
levels = c("Missing or 0", "<15k",
"15k–30k", "30k–50k",
"50k–100k", "100k+")
)
)To adapt the code:
- Line 6: If you used a different shapefile or admin join level earlier, update
by = "adm3to the level you used (e.g.,adm2oradm1). - Lines 11-16: You may adjust the population bins to match meaningful ranges for your context.
Once updated, run the code to set-up your population data for mapping.
Step 5.2: Generate map and check results
We use ggplot2 to create a faceted map, showing population by chiefdom (adm3) from 2020 to 2023. Overlaying adm1 boundaries improves spatial reference. If any chiefdoms have missing or zero values, they’ll appear clearly—making this step a valuable quality control check.
# plot extracted raster population
# generate adm0-level shapefile for plotting
shp_adm0 <- shp_adm3 |>
dplyr::group_by(adm0) |>
dplyr::summarise(.groups = "drop") |>
sf::st_union(by_feature = TRUE)
# generate adm1-level shapefile for plotting
shp_adm1 <- shp_adm3 |>
dplyr::group_by(adm1) |>
dplyr::summarise(.groups = "drop") |>
sf::st_union(by_feature = TRUE)
adm3_raster_pop_map <- shp_adm3_raster_pop |>
ggplot2::ggplot() +
ggplot2::geom_sf(
ggplot2::aes(fill = pop_bin),
color = "white",
size = 0.2
) +
# add adm1 outline
ggplot2::geom_sf(
data = shp_adm1,
fill = NA, color = "black", size = 0.5
) +
ggplot2::scale_fill_manual(
name = "Population Range",
values = c("Missing or 0" = "grey80",
"<15k" = "#fee5d9",
"15k–30k" = "#fcae91",
"30k–50k" = "#fb6a4a",
"50k–100k" = "#de2d26",
"100k+" = "#a50f15"),
na.value = "grey80",
guide = ggplot2::guide_legend(
label.position = "bottom",
title.position = "top"
)
) +
ggplot2::facet_wrap(~year, nrow = 1) +
ggplot2::labs(
title = "Raster-Extracted Population by Chiefdom (adm3) for 2020 - 2023",
subtitle = "Extracted from WorldPop Raster Data (2020)",
caption = "Note: Population projected to 2021–2023 using 1.5% growth rate.",
fill = "Population"
) +
ggplot2::theme_minimal() +
ggplot2::theme(
legend.position = "bottom",
legend.title.position = "top",
legend.key.width = grid::unit(1, "cm"),
axis.text.x = ggplot2::element_text(
angle = 75, vjust = 1, hjust = 1
)
)To adapt the code:
- Line 14 and 42: If you joined at a different admin level (e.g.,
adm2), updateshp_adm3_raster_popand the map title accordingly. - Line 23: If your
adm1shapefile is named differently, updatedata = shp_adm1to your object name. - Line 28-33: Adjust color values and ranges if you defined different population bins.
Once updated, run the code to map your aggregated data.
Step 5.3: Save plots
The map is then saved as a high-resolution PNG for future reference.
To adapt the code:
- Line 4: Update the filename path to match your folder structure. You can change
pop_rast_check_adm3_2020_2023.png. - Line 5: Adjust the width, height, and dpi if you need a different image size or quality.
Once updated, run the code to save the plot as a PNG file.
Step 6: Other Uses of Raster-Based Population Data
While population rasters are commonly used to generate subnational totals, they can also support other spatial analyses that feed into SNT decision-making. This section outlines how WorldPop can be used to complement census data for subnational tailoring tasks, especially when finer geographic resolution or relative population patterns are needed.
While modeled rasters like WorldPop may not perfectly match census totals, they often capture relative population distribution well, particularly across subdistricts or within large, heterogeneous districts. This makes them useful for tasks such as weighting, proportional allocation, and assessing spatial access.
Step 6.1: Calculate proportion of population per district
One common use of WorldPop is to calculate the proportion of the national population represented by each unit in our shapefile. In this case, we work at the adm3 level and compute each unit’s share of the total national population, as well as its share within its adm1 and adm2 boundaries. This is useful when national population totals are available, but subnational breakdowns are outdated or unreliable. It also allows us to allocate resources proportionally across areas, generate relative weights for further analysis, or compare population distribution across spatial hierarchies. This approach treats WorldPop raster estimates not as a substitute for census counts, but as a way to infer consistent spatial proportions when more granular data is missing.
In the code below, we calculate total population at the national, adm1, and adm2 levels, and then compute the share of each adm3 unit relative to each of those three totals.
# Calculate pop shares at national, adm1, and adm2 levels
pop_share <- shp_adm3 |>
sf::st_drop_geometry() |>
dplyr::mutate(
adm0_total = sum(pop, na.rm = TRUE)
) |>
dplyr::group_by(adm1) |>
dplyr::mutate(
total_adm1 = sum(pop, na.rm = TRUE)
) |>
dplyr::group_by(adm1, adm2, .add = TRUE) |>
dplyr::mutate(
total_adm2 = sum(pop, na.rm = TRUE),
prop_adm0 = pop / adm0_total,
prop_adm1 = pop / total_adm1,
prop_adm2 = pop / total_adm2
) |>
dplyr::ungroup()
# Check head
pop_share |>
dplyr::filter(
adm2 == "WESTERN URBAN"
) |>
dplyr::select(
adm1, adm2, adm3, year, pop,
prop_adm0, prop_adm1, prop_adm2
) |>
dplyr::arrange(desc(prop_adm0)) |>
head()To adapt the code:
- Lines 4, 6, 10: Replace
adm1,adm2, andpopwith the appropriate column names from your shapefile. - Line 1: Ensure your shapefile includes the population variable extracted from WorldPop for each subunit.
- Lines 6, 10: Adjust grouping if working at a different level (e.g.,
adm1→adm0).
Once updated, run the code to generate proportional population shares for each subunit, which can then be used to disaggregate totals or guide targeting.
This output confirms that the calculation worked: in 2020, the EAST III and WEST III subunits together accounted for 74% of Western Urban’s population and over 11% of the national population. This shows how population is concentrated within just a few adm3 units in a major urban district.
These proportions can support future use cases, such as weighted analysis or resource allocation, and can be translated into absolute counts using official census totals.
Step 6.2: Normalize population per pixel within each admin
When the goal is to understand how people are distributed within an a given admin such as district, not how many people live there overall, we use a normalized population surface. Each pixel is rescaled to reflect its share of the total population in its admin level, making it easier to calculate things like: what proportion of the district lives within 5 km of a facility?
Normalizing the population raster by district helps account for where people actually live within each unit. This is especially useful when combining rasters (e.g., population with access surfaces) or estimating the share of a population affected by a spatial feature like a health facility buffer.
Whether we use raw or normalized population depends on our objective:
- Raw population gives the number of people per pixel.
- Normalized population gives each pixel’s share of the district total (adds to 1 within each unit).
Use raw counts when we need absolute numbers. Use normalized values for within-district proportions or spatial weighting.
| Our goal | Use raw pop? | Normalize? |
|---|---|---|
| Count people within a buffer | ✅ Yes | ❌ No |
| Visualize where most people live | ✅ Yes | ⚠️ Maybe – use if focusing on relative density |
| Disaggregate a district total (e.g., 500 cases) | ❌ No | ✅ Yes |
| Weight raster values within a district | ❌ No | ✅ Yes |
We normalize the 2020 WorldPop raster so that pixel values within each district sum to 1 using sntutils::normalize_raster_by_polygon(). The result is a raster where each pixel represents its share of the population within the admin unit of interest. This relative surface is useful for later steps involving spatial weighting or proportional disaggregation, for example, to obtain a district-level estimate of mean Plasmodium falciparum parasite prevalence in children aged 2-10 years (PfPR2-10) using a geospatial modeled surface of (PfPR2-10).
To adapt the code:
- Line 3: Change
shp_adm3to your own shapefile if using a different geography. - Line 4: Replace
adm2with the name of the column that identifies your target admin unit. - Line 2: Ensure
sle_2020_rastis the population raster you want to normalize.
Once updated, run the code to generate a within-unit normalized raster, where values sum to 1 per admin area.
Step 6.3: Compare original and normalized population rasters
Now that we’ve created a normalized version of the 2020 WorldPop raster, we can visually compare it to the original raster. This helps illustrate how normalization emphasizes relative distribution rather than absolute population counts.
Show full code
# Convert rasters to data.frames
df_orig <- as.data.frame(sle_2020_rast, xy = TRUE, na.rm = T)
df_norm <- as.data.frame(normed_wp_raster, xy = TRUE, na.rm = T)
# Original population raster
p1 <- ggplot2::ggplot(
df_orig,
ggplot2::aes(x, y, fill = sle_pd_2020_1km_UNadj)
) +
ggplot2::geom_raster() +
ggplot2::coord_quickmap() +
ggplot2::labs(
title = "Original Raster (adm2)",
fill = "Population"
) +
ggplot2::scale_fill_distiller(
palette = "OrRd",
direction = 1,
labels = scales::label_comma(big.mark = ","),
guide = ggplot2::guide_colorbar(
title.position = "top",
title.hjust = 0.5,
barheight = grid::unit(0.2, "cm"),
barwidth = grid::unit(5.5, "cm")
)
) +
ggplot2::theme_void() +
ggplot2::theme(
legend.position = "bottom",
plot.title = ggplot2::element_text(hjust = 0.5),
legend.title.align = 0.5
)
# Normalized population raster
p2 <- ggplot2::ggplot(
df_norm,
ggplot2::aes(x, y, fill = sle_pd_2020_1km_UNadj * 100)
) +
ggplot2::geom_raster() +
ggplot2::coord_quickmap() +
ggplot2::labs(
title = "Normalized Raster (adm2)",
fill = "Population (%)"
) +
ggplot2::scale_fill_distiller(
palette = "OrRd",
direction = 1,
labels = scales::label_number(suffix = "%", accuracy = 0.1),
guide = ggplot2::guide_colorbar(
title.position = "top",
title.hjust = 0.5,
barheight = grid::unit(0.2, "cm"),
barwidth = grid::unit(5.5, "cm")
)
) +
ggplot2::theme_void() +
ggplot2::theme(
legend.position = "bottom",
plot.title = ggplot2::element_text(hjust = 0.5),
legend.title.align = 0.5
)
# assemble plots
(p1 + p2) & ggplot2::theme(legend.position = "bottom")To adapt the code:
- Lines 2, 3: Replace
sle_2020_rastwith your original raster andnormed_wp_rasterwith your normalized one. - Lines 8, 37: Update the
fillvariable if your raster uses a different layer name. - Lines 13, 42: Adjust
adm2or any labels to reflect your admin level. - This comparison works best when both rasters are aligned spatially and temporally.
Once updated, run the code to generate your comparison plots.
Step 7: Save Data
After aggregating and validating the population data, it’s good practice to save the final population data. This allows us to reuse the cleaned and structured data in later steps without needing to reprocess from scratch.
We save the raw raster (.tif) and the aggregated population table as .rds, .csv, and .xlsx formats.
# set up output path
save_path <- here::here(
"01_data",
"1.1_foundational",
"1.1c_population",
"1ci_worldpop_rasters"
)
# save the raw raster
terra::writeRaster(
sle_2020_rast,
here::here(save_path, "1ci_worldpop_rasters", "sle_2020_pop_rast.tif")
)
# save the normalised raw raster
terra::writeRaster(
normed_wp_raster,
here::here(save_path, "1ci_worldpop_rasters", "sle_2020_pop_rast_normed.tif")
)
# save to rds
rio::export(
pop_share,
here::here(save_path, "1ci_population", "sle_2020_pop_rast.rds")
)
# save to csv
rio::export(
pop_share,
here::here(save_path, "1ci_population", "sle_2020_pop_rast.csv")
)
# save to xlsx
rio::export(
pop_share,
here::here(save_path, "1ci_population", "sle_2020_pop_rast.xlsx")
)To adapt the code:
- Line 3: Update
save_pathto your output directory.pop_long: Replace if your rainfall column has a different name. - File Names: Change file names (e.g.,
sle_2020_pop_rast.csv) to reflect your dataset or time range.
Once updated, run the code to save your outputs in both raw and processed formats.
Summary
We’ve now walked through how to work with modeled population rasters like WorldPop—from importing .tif files, extracting values using shapefiles, to projecting totals for recent years. These layers are particularly helpful when census data is missing or too coarse, and give us flexible, high-resolution population estimates that can be aligned with any spatial unit. We’ve also touched on good practices: harmonizing projections, validating joins, and saving cleaned outputs in reusable formats.
We also explored additional uses, calculating population shares across districts and normalizing rasters within units to support spatial weighting and disaggregation. These methods treat the raster as a distribution surface, not just a count layer. The full code is available at the end of this section (collapsed for convenience). We can use it as a template—just update paths, or admin levels to fit our context.
Full code
Find the full code script for working with WorldPop population rasters below.
Show full code
################################################################################
################### ~ WorldPop population raster full code ~ ###################
################################################################################
### Step 1: Import Raster ------------------------------------------------------
# Install or load required packages
pacman::p_load(
httr, # API access (e.g., MAP, WorldPop)
sf, # vector data (shapefiles)
scales, # label formatting
patchwork, # combine ggplots
terra, # raster handling
exactextractr # zonal stats from rasters
)
# Define direct link to 2020 Sierra Leone population raster
sle_pop_path <- paste0(
"https://data.worldpop.org/GIS/Population_Density/",
"Global_2000_2020_1km_UNadj/2020/SLE/sle_pd_2020_1km_UNadj.tif"
)
# Import raster
sle_2020_rast <- terra::rast(sle_pop_path)
### Step 2: Extract Population Values by Admin Unit ----------------------------
#### Step 2.1: Get shapefile and set CRS ---------------------------------------
# Get shapefile
shp_adm3 <- sf::read_sf(
here::here(
"1.1_foundational",
"1.1a_administrative_boundaries",
"Chiefdom2021.shp"
)
) |>
dplyr::mutate(adm0 = "SIERRA LEONE") |>
dplyr::select(
adm0,
adm1 = FIRST_REGI,
adm2 = FIRST_DNAM,
adm3 = FIRST_CHIE
)
# Ensure both layers use same CRS
shp_adm3 <- sf::st_transform(shp_adm3, crs = terra::crs(sle_2020_rast))
### Step 2: Extract Population Values by Admin Unit ----------------------------
#### Step 2.1: Get shapefile and set CRS ---------------------------------------
terra::crs(raster_file)
sf::st_crs(shapefile)
### Step 2: Extract Population Values by Admin Unit ----------------------------
#### Step 2.1: Get shapefile and set CRS ---------------------------------------
shapefile <- sf::st_transform(shapefile, crs = terra::crs(raster_file))
### Step 2: Extract Population Values by Admin Unit ----------------------------
#### Step 2.2: Extract population from raster ----------------------------------
# Use exactextractr to extract weighted sums
shp_adm3$pop <- exactextractr::exact_extract(
sle_2020_rast, shp_adm3, 'sum',
progress = FALSE # we turn off progress bar
) |> round()
# Preview result
shp_adm3 |>
dplyr::select(adm0, adm1, adm2, adm3, year, pop) |> head()
### Step 3: Aggregating Population Data ----------------------------------------
# summarize total population by adm0
pop_summary_adm2 <- shp_adm3 |>
sf::st_drop_geometry() |>
dplyr::group_by(adm1, adm2, year) |>
dplyr::reframe(pop_total = sum(pop, na.rm = TRUE))
pop_summary_adm2
### Step 4: Extend Population Estimates Beyond 2020 ----------------------------
#### Step 4.1: Prepare for interpolation ---------------------------------------
# apply population growth rate for
# the next three years
sl_pop_2020_2023 <- shp_adm3 |>
sf::st_drop_geometry() |>
dplyr::select(-year) |>
dplyr::rename(pop2020 = pop) |>
dplyr::mutate(
pop2021 = round(pop2020 * 1.015),
pop2022 = round(pop2021 * 1.015),
pop2023 = round(pop2022 * 1.015)
)
# check results
sl_pop_2020_2023
### Step 4: Extend Population Estimates Beyond 2020 ----------------------------
#### Step 4.2: Reshape to long format ------------------------------------------
# turn wide pop to long
sl_pop_long <- sl_pop_2020_2023 |>
tidyr::pivot_longer(
cols = dplyr::starts_with("pop"),
names_to = "year",
values_to = "pop"
) |>
dplyr::mutate(
year = stringr::str_remove(year, "pop") |>
as.integer()
)
### Step 5: Visualizing Raster-Extracted Population Data -----------------------
#### Step 5.1: Join and bin population data ------------------------------------
# join extracted raster pop to shapefile
shp_adm3_raster_pop <- sl_pop_long |>
dplyr::left_join(
# join shapefile to pop data
dplyr::select(shp_adm3, -pop, -year),
by = c("adm0", "adm1", "adm2", "adm3")
) |> sf::st_as_sf() |>
# create categorical version of pop column
dplyr::mutate(
pop_bin = dplyr::case_when(
is.na(pop) | pop == 0 ~ "Missing or 0",
pop <= 15000 ~ "<15k",
pop <= 30000 ~ "15k–30k",
pop <= 50000 ~ "30k–50k",
pop <= 100000 ~ "50k–100k",
pop > 100000 ~ "100k+"
),
pop_bin = factor(
pop_bin,
levels = c("Missing or 0", "<15k",
"15k–30k", "30k–50k",
"50k–100k", "100k+")
)
)
### Step 5: Visualizing Raster-Extracted Population Data -----------------------
#### Step 5.2: Generate map and check results ----------------------------------
# plot extracted raster population
# generate adm0-level shapefile for plotting
shp_adm0 <- shp_adm3 |>
dplyr::group_by(adm0) |>
dplyr::summarise(.groups = "drop") |>
sf::st_union(by_feature = TRUE)
# generate adm1-level shapefile for plotting
shp_adm1 <- shp_adm3 |>
dplyr::group_by(adm1) |>
dplyr::summarise(.groups = "drop") |>
sf::st_union(by_feature = TRUE)
adm3_raster_pop_map <- shp_adm3_raster_pop |>
ggplot2::ggplot() +
ggplot2::geom_sf(
ggplot2::aes(fill = pop_bin),
color = "white",
size = 0.2
) +
# add adm1 outline
ggplot2::geom_sf(
data = shp_adm1,
fill = NA, color = "black", size = 0.5
) +
ggplot2::scale_fill_manual(
name = "Population Range",
values = c("Missing or 0" = "grey80",
"<15k" = "#fee5d9",
"15k–30k" = "#fcae91",
"30k–50k" = "#fb6a4a",
"50k–100k" = "#de2d26",
"100k+" = "#a50f15"),
na.value = "grey80",
guide = ggplot2::guide_legend(
label.position = "bottom",
title.position = "top"
)
) +
ggplot2::facet_wrap(~year, nrow = 1) +
ggplot2::labs(
title = "Raster-Extracted Population by Chiefdom (adm3) for 2020 - 2023",
subtitle = "Extracted from WorldPop Raster Data (2020)",
caption = "Note: Population projected to 2021–2023 using 1.5% growth rate.",
fill = "Population"
) +
ggplot2::theme_minimal() +
ggplot2::theme(
legend.position = "bottom",
legend.title.position = "top",
legend.key.width = grid::unit(1, "cm"),
axis.text.x = ggplot2::element_text(
angle = 75, vjust = 1, hjust = 1
)
)
### Step 6: Other Uses of Raster-Based Population Data -------------------------
#### Step 6.1: Calculate proportion of population per district -----------------
# Calculate pop shares at national, adm1, and adm2 levels
pop_share <- shp_adm3 |>
sf::st_drop_geometry() |>
dplyr::mutate(
adm0_total = sum(pop, na.rm = TRUE)
) |>
dplyr::group_by(adm1) |>
dplyr::mutate(
total_adm1 = sum(pop, na.rm = TRUE)
) |>
dplyr::group_by(adm1, adm2, .add = TRUE) |>
dplyr::mutate(
total_adm2 = sum(pop, na.rm = TRUE),
prop_adm0 = pop / adm0_total,
prop_adm1 = pop / total_adm1,
prop_adm2 = pop / total_adm2
) |>
dplyr::ungroup()
# Check head
pop_share |>
dplyr::filter(
adm2 == "WESTERN URBAN"
) |>
dplyr::select(
adm1, adm2, adm3, year, pop,
prop_adm0, prop_adm1, prop_adm2
) |>
dplyr::arrange(desc(prop_adm0)) |>
head()
### Step 6: Other Uses of Raster-Based Population Data -------------------------
#### Step 6.2: Normalize population per pixel within each admin ----------------
# Normalize pop per pixel within each adm2
normed_wp_raster <- sntutils::normalize_raster_by_polygon(
raster = sle_2020_rast,
shp = shp_adm3,
id_col = "adm2"
)
### Step 6: Other Uses of Raster-Based Population Data -------------------------
#### Step 6.3: Compare original and normalized population rasters --------------
# Convert rasters to data.frames
df_orig <- as.data.frame(sle_2020_rast, xy = TRUE, na.rm = T)
df_norm <- as.data.frame(normed_wp_raster, xy = TRUE, na.rm = T)
# Original population raster
p1 <- ggplot2::ggplot(
df_orig,
ggplot2::aes(x, y, fill = sle_pd_2020_1km_UNadj)
) +
ggplot2::geom_raster() +
ggplot2::coord_quickmap() +
ggplot2::labs(
title = "Original Raster (adm2)",
fill = "Population"
) +
ggplot2::scale_fill_distiller(
palette = "OrRd",
direction = 1,
labels = scales::label_comma(big.mark = ","),
guide = ggplot2::guide_colorbar(
title.position = "top",
title.hjust = 0.5,
barheight = grid::unit(0.2, "cm"),
barwidth = grid::unit(5.5, "cm")
)
) +
ggplot2::theme_void() +
ggplot2::theme(
legend.position = "bottom",
plot.title = ggplot2::element_text(hjust = 0.5),
legend.title.align = 0.5
)
# Normalized population raster
p2 <- ggplot2::ggplot(
df_norm,
ggplot2::aes(x, y, fill = sle_pd_2020_1km_UNadj * 100)
) +
ggplot2::geom_raster() +
ggplot2::coord_quickmap() +
ggplot2::labs(
title = "Normalized Raster (adm2)",
fill = "Population (%)"
) +
ggplot2::scale_fill_distiller(
palette = "OrRd",
direction = 1,
labels = scales::label_number(suffix = "%", accuracy = 0.1),
guide = ggplot2::guide_colorbar(
title.position = "top",
title.hjust = 0.5,
barheight = grid::unit(0.2, "cm"),
barwidth = grid::unit(5.5, "cm")
)
) +
ggplot2::theme_void() +
ggplot2::theme(
legend.position = "bottom",
plot.title = ggplot2::element_text(hjust = 0.5),
legend.title.align = 0.5
)
# assemble plots
(p1 + p2) & ggplot2::theme(legend.position = "bottom")
### Step 7: Save Data ----------------------------------------------------------
# set up output path
save_path <- here::here(
"01_data",
"1.1_foundational",
"1.1c_population",
"1ci_worldpop_rasters"
)
# save the raw raster
terra::writeRaster(
sle_2020_rast,
here::here(save_path, "1ci_worldpop_rasters", "sle_2020_pop_rast.tif")
)
# save the normalised raw raster
terra::writeRaster(
normed_wp_raster,
here::here(save_path, "1ci_worldpop_rasters", "sle_2020_pop_rast_normed.tif")
)
# save to rds
rio::export(
pop_share,
here::here(save_path, "1ci_population", "sle_2020_pop_rast.rds")
)
# save to csv
rio::export(
pop_share,
here::here(save_path, "1ci_population", "sle_2020_pop_rast.csv")
)
# save to xlsx
rio::export(
pop_share,
here::here(save_path, "1ci_population", "sle_2020_pop_rast.xlsx")
)
### Step 7: Save Data ----------------------------------------------------------
# save as RDS
rio::export(
pop_share,
here::here("english/data_r/pop/final_pop_rast_2020_2023.rds")
)