• English
  • Français
  1. 2. Data Assembly and Management
  2. 2.5 Population Data
  3. WorldPop population raster
  • Code library for subnational tailoring
    English version
  • 1. Getting Started
    • 1.1 About and Contact Information
    • 1.2 For Everyone
    • 1.3 For the SNT Team
    • 1.4 For Analysts
    • 1.5 Producing High-Quality Outputs
  • 2. Data Assembly and Management
    • 2.1 Working with Shapefiles
      • Spatial data overview
      • Basic shapefile use and visualization
      • Shapefile management and customization
      • Merging shapefiles with tabular data
    • 2.2 Health Facilities Data
      • Fuzzy matching of names across datasets
      • Health facility coordinates and point data
      • Determining active and inactive status
    • 2.3 Routine Surveillance Data
      • Routine data extraction
      • DHIS2 data preprocessing
      • Assessing missing data
      • Health facility reporting rate
      • Data coherency checks
      • Outlier detection methods
      • Imputing missing data and correcting outliers
      • Final database
    • 2.4 Stock Data
      • LMIS
    • 2.5 Population Data
      • National population data
      • WorldPop population raster
    • 2.6 National Household Survey Data
      • DHS data overview and preparation
      • Prevalence of malaria infection
      • All-cause child mortality
      • Treatment-seeking rates
      • ITN ownership, access, and usage
    • 2.7 Entomological Data
      • Entomological data
    • 2.8 Climate and Environmental Data
      • Climate and environment data extraction from raster
    • 2.9 Modeled Data
      • Generating spatial modeled estimates
      • Working with geospatial model estimates
      • Modeled estimates of malaria mortality and proxies
      • Modeled estimates of entomological indicators
  • 3. Stratification
    • 3.1 Epidemiological Stratification
      • Incidence overview and crude incidence
      • Incidence adjustment 1: incomplete testing
      • Incidence adjustment 2: incomplete reporting
      • Incidence adjustment 3: treatment-seeking
      • Incidence stratification
      • Prevalence and mortality stratification
      • Combined risk categorization
    • 3.2 Stratification of Determinants of Malaria Transmission
      • Seasonality
      • Access to care
  • 4. Review of Past Interventions
    • 4.1 Case Management
    • 4.2 Routine Interventions
    • 4.3 Campaign Interventions
    • 4.4 Other Interventions
  • 5. Targeting of Interventions
  • 6. Retrospective Analysis
  • 7. Urban Microstratification

On this page

  • Overview
  • When to Use Population Rasters
  • How Rasters Work
  • Step-by-Step
    • Step 1: Import Raster
    • Step 2: Extract Population Values by Admin Unit
      • Step 2.1: Get shapefile and set CRS
      • Step 2.2: Extract population from raster
    • Step 3: Aggregating Population Data
    • Step 4: Extend Population Estimates Beyond 2020
      • Step 4.1: Prepare for interpolation
      • Step 4.2: Reshape to long format
    • Step 5: Visualizing Raster-Extracted Population Data
      • Step 5.1: Join and bin population data
      • Step 5.2: Generate map and check results
      • Step 5.3: Save plots
    • Step 6: Other Uses of Raster-Based Population Data
      • Step 6.1: Calculate proportion of population per district
      • Step 6.2: Normalize population per pixel within each admin
      • Step 6.3: Compare original and normalized population rasters
    • Step 7: Save Data
  • Summary
  • Full code
  1. 2. Data Assembly and Management
  2. 2.5 Population Data
  3. WorldPop population raster

WorldPop population raster

Overview

While national population data is the primary source for SNT, high-resolution modeled datasets like WorldPop can provide additional value—especially when working at fine spatial scales (e.g., 100m or 1km grids). These gridded estimates are useful for specific tasks such as estimating access to care or mapping population distribution within large or heterogeneous admin units.

WorldPop may be useful when a) we need pixel-level estimates to model access or calculate catchment populations; b) we’re estimating target groups like under-fives or women of reproductive age; c) admin population data does not align well with our spatial boundaries; or d) recent national projections are unavailable, and the SNT team has cleared the use of modeled alternatives.

Objectives
  • Understand how raster-based population datasets work
  • Download and import WorldPop .tif files directly
  • Use shapefiles to extract population totals at the admin unit level
  • Align CRS between raster and vector layers before extraction
  • Apply annual growth rates to project population beyond 2020
  • Reshape population data to long format for easier integration
  • Validate extracted results visually using population maps
  • Explore other use cases of raster population data for SNT analysis
  • Normalize raster values within each unit to support spatial weighting
  • Save processed and generated outputs for reuse

When to Use Population Rasters

High-resolution population rasters are not just stand-alone counts—they can be integrated directly into modeling, weighting, and allocation tasks. Estimates from population rasters like WorldPop can support more than just subnational population counts:

  • Proportional Allocation: When official data exists at higher levels (e.g., adm2) but lower-level detail (e.g., adm3) is missing or outdated, WorldPop can provide population shares to distribute known totals spatially.

  • Population Weighting: In some analyses, indicators such as PfPR2-10, ITN access, or care-seeking may be modeled at fine spatial scales (e.g., 1 km raster). To summarize these to the district level, population-weighted means are preferable to simple averages—ensuring areas with more people contribute more to the aggregated value. WorldPop can be used to supply the population weights for this purpose.

  • Catchment & Access Modeling: Combine population and travel-time rasters to estimate the % of people within or beyond specific access thresholds (e.g., 5 km from health facilities), and then apply these shares to census totals.

These use cases treat WorldPop as a tool for distribution and spatial inference, not as a substitute for census counts.

Consult SNT Team

WorldPop and other modeled datasets can offer high-resolution population estimates, especially useful for analyses like access-to-care or catchment modeling. However, these are not official sources and must not be used unless approved by the SNT team.

  • The SNT team will advise if and when modeled data can be used in your setting.
  • Never substitute these datasets for census-based figures unless explicitly directed.
  • If you’re unsure whether use is appropriate, always consult the SNT team first.

This ensures population estimates remain credible, aligned with national policy, and comparable across geographies.

In this section, we’ll walk through how to source WorldPop data, extract values using shapefiles, and get everything cleaned and aggregated for your SNT analysis.

How Rasters Work

Raster datasets work differently from spreadsheets. Instead of rows and columns of text or numbers, a raster is like a matrix of pixels. Each pixel has a value—just like each cell in Excel—but instead of being part of a table, it’s part of a spatial grid. That grid is tied to real-world geographic coordinates. As an example, below is a population density raster map of Rwanda using 2020 UN-adjusted estimates.

Figure: WorldPop-estimated population density in Rwanda (2020). Source: WorldPop, UN-adjusted 1km raster.

For WorldPop, each pixel represents the estimated number of people living in a small area—usually either 100m × 100m or 1km × 1km. This lets us estimate population density at very fine resolution, and then aggregate upward into admin units.

WorldPop rasters are typically stored as .tif files (GeoTIFF format). Each file usually represents one country, one year, and one population group (e.g., total population, under-fives, WRA, etc.)

Raster unit of measurements

The units for each pixel are people per pixel (ppp), not people per square kilometer. If you’re using a 100m raster, a value of 25 means 25 people in that 100m × 100m square.

These values are modeled—using machine learning and spatial covariates (like land use, road networks, and night-time lights)—and then constrained using census data where available. So, while they give good spatial resolution, they’re still estimates, and may not match official figures exactly.

Here’s what to expect when you open a raster:

Layer Name What It Represents
sle_ppp_2020_UNadj.tif Sierra Leone, total population, 2020
nga_f_0_4_2021_UNadj.tif Nigeria, females aged 0–4, 2021
eth_m_15_49_2019_UNadj.tif Ethiopia, males aged 15–49, 2019

Why does this matter for SNT?

Because raster data doesn’t come organized into districts or provinces—we have to extract values using shapefiles. That’s where the spatial analysis comes in. We overlay our admin boundaries (e.g., adm3 shapefile), and sum the values of all the pixels that fall within each unit. This gives us a population estimate by district, chiefdom, or catchment.

Step-by-Step

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Import Raster

In this section, we will work with WorldPop raster datasets. We first need to download the latest population raster for our country of interest. We can explore available rasters by year and country at the WorldPop Data Hub. We use the terra package to directly import the raster into R from the WorldPop server.

  • R
  • Python
# Install or load required packages
pacman::p_load(
  httr,           # API access (e.g., MAP, WorldPop)
  sf,             # vector data (shapefiles)
  scales,         # label formatting
  patchwork,      # combine ggplots
  terra,          # raster handling
  exactextractr   # zonal stats from rasters
)

# Define direct link to 2020 Sierra Leone population raster
sle_pop_path <- paste0(
  "https://data.worldpop.org/GIS/Population_Density/",
  "Global_2000_2020_1km_UNadj/2020/SLE/sle_pd_2020_1km_UNadj.tif"
)

# Import raster
sle_2020_rast <- terra::rast(sle_pop_path)

To adapt the code:

  • Line 10-11: Replace the URL with the link to your country’s raster file or specify folder path if you already have the raster locally.

Once updated, run the code to read the raster file locally.

Step 2: Extract Population Values by Admin Unit

Once we have our population raster and admin shapefile, the next step is to extract population totals for each admin unit.

It’s best to use a shapefile that matches the level of analysis—for example, if your SNT analysis is at adm3, then extract population at adm3. Avoid using higher-level shapefiles (e.g., adm1) and trying to disaggregate downward. This often introduces errors or unrealistic assumptions.

While extracting from a lower level (e.g., adm3) can offer flexibility to aggregate upward, this is only useful if the lower-level shapefile is complete and reliable. In practice, some countries have poor coverage or inconsistencies at very fine levels like adm3 or adm4. So unless there’s a specific reason to go lower, match your shapefile to the analysis level.

Step 2.1: Get shapefile and set CRS

  • R
  • Python
# Get shapefile
shp_adm3 <- sf::read_sf(
  here::here(
    "1.1_foundational",
    "1.1a_administrative_boundaries",
    "Chiefdom2021.shp"
  )
) |>
  dplyr::mutate(adm0 = "SIERRA LEONE") |>
  dplyr::select(
    adm0,
    adm1 = FIRST_REGI,
    adm2 = FIRST_DNAM,
    adm3 = FIRST_CHIE
  )

# Ensure both layers use same CRS
shp_adm3 <- sf::st_transform(shp_adm3, crs = terra::crs(sle_2020_rast))

To adapt the code:

  • Line 5-7: Replace the path in the sf::read_sf function with the local path where your shapefile data is stored.
  • Line 10-15: These are optional steps for creating the adm0 column and selecting or renaming relevant admin columns. This will vary depending on your shapefile, so adapt accordingly.

Once updated, run the code to load your shapefile.

In the code above, we first download the adm3-level shapefile, which contains the boundaries we want to use for summarizing the population data. However, spatial operations require that both the raster and vector layers share the same coordinate reference system (CRS). We handle that using sf::st_transform() to align the shapefile to the CRS used by the raster.

Quick Tip: Why does projection alignment matter before extraction?

Before extracting population values, always make sure your raster and shapefile are in the same Coordinate Reference System (CRS).

  • Misaligned projections can cause extraction errors, incorrect population counts, or features being shifted outside expected boundaries.

  • Population rasters like WorldPop are usually provided in WGS84 (EPSG:4326), but your shapefiles might use different systems like UTM or country-specific projections.

  • Always check and reproject if needed to avoid conflicting projections.

You can check the CRS of your layers in R:

terra::crs(raster_file)
sf::st_crs(shapefile)

If they differ, reproject your shapefile to match the raster:

shapefile <- sf::st_transform(shapefile, crs = terra::crs(raster_file))

Aligning projections ensures the extraction overlays are accurate and reproducible. It is always a good practice to always do this at the start.

Step 2.2: Extract population from raster

We use the exact_extract() function from the exactextractr package. This is well-suited for our needs because:

  • It calculates area-weighted summaries, which is important when raster cells only partially overlap admin boundaries. This ensures more accurate totals, especially around irregular borders.

  • It’s faster and more memory-efficient than alternatives like raster::extract() or terra::extract(), as it’s built on a C++ backend for performance.

  • It respects geometry boundaries precisely and avoids double-counting, giving clean, reliable results that scale well to national-level rasters.

  • R
  • Python
# Use exactextractr to extract weighted sums
shp_adm3$pop <- exactextractr::exact_extract(
  sle_2020_rast, shp_adm3, 'sum',
  progress = FALSE # we turn off progress bar
) |> round()

# Preview result
shp_adm3 |>
  dplyr::select(adm0, adm1, adm2, adm3, year, pop) |> head()
Output
          adm0    adm1     adm2        adm3 year   pop
1 SIERRA LEONE EASTERN KAILAHUN         DEA 2020 20003
2 SIERRA LEONE EASTERN KAILAHUN        JAHN 2020 10116
3 SIERRA LEONE EASTERN KAILAHUN       JAWIE 2020 76665
4 SIERRA LEONE EASTERN KAILAHUN  KISSI KAMA 2020 26207
5 SIERRA LEONE EASTERN KAILAHUN  KISSI TENG 2020 44124
6 SIERRA LEONE EASTERN KAILAHUN KISSI TONGI 2020 62577

To adapt the code:

  • Line 3: Replace sle_2020_rast with your raster object and shp_adm3 with your shapefile object
  • Lines 9: Adjust the columns in dplyr::select() to match your admin level columns (if named differently).

Once updated, run the code to extract population from raster.

This step creates a new column, pop, containing the total estimated population for each adm3 unit in 2020, extracted from the WorldPop raster. We also add a year column set to 2020 to clearly mark the time reference. These values can now be treated like census-based totals—ready to be aggregated, visualized, or integrated into the broader SNT analysis pipeline.

Step 3: Aggregating Population Data

In this step, we aggregate the population counts to admin level 2 (adm2). This produces summarized population estimates by district or equivalent units, which can be used in later analyses or mapping exercises.

  • R
  • Python
# summarize total population by adm0
pop_summary_adm2 <- shp_adm3 |>
  sf::st_drop_geometry() |>
  dplyr::group_by(adm1, adm2, year) |>
  dplyr::reframe(pop_total = sum(pop, na.rm = TRUE))

pop_summary_adm2
Output
        adm1      adm2 year pop_total
1    EASTERN  KAILAHUN 2020    669106
2    EASTERN    KENEMA 2020    959041
3    EASTERN      KONO 2020    401503
4 NORTH EAST   BOMBALI 2020    458349
5 NORTH EAST    FALABA 2020    261306
6 NORTH EAST KOINADUGU 2020    228451

To adapt the code:

  • Line 4: Update adm1 and adm2 to match the admin levels you are working with (e.g., region, district, etc.), based on your shapefile attributes.

Once updated, run the code to load aggregate population data.

Step 4: Extend Population Estimates Beyond 2020

WorldPop raster data is only available up to 2020 in most countries. This presents a challenge if population estimates are needed for more recent years.

Step 4.1: Prepare for interpolation

To prepare for interpolation, we first convert the dataset from long to wide format. This involves removing the year column and renaming the extracted pop column to pop2020. We then use a fixed annual growth rate (e.g., 1.5%) to project population totals for subsequent years. This structure makes it easy to apply year-on-year calculations and is suitable when raster data is only available for a single base year.

Consult SNT Team

If population growth rates are needed to project values beyond the available data year (e.g., for extrapolating WorldPop estimates), the appropriate rate must come from the national statistics bureau (or equivalent). Do not apply assumed or ad hoc growth rates.

  • The SNT team is responsible for sourcing and approving any growth parameters.
  • Always consult the SNT team before using projections in your analysis.

This ensures population estimates remain technically sound, credible, and aligned with national guidance.

  • R
  • Python
# apply population growth rate for
# the next three years
sl_pop_2020_2023 <- shp_adm3 |>
  sf::st_drop_geometry() |>
  dplyr::select(-year) |>
  dplyr::rename(pop2020 = pop) |>
  dplyr::mutate(
    pop2021 = round(pop2020 * 1.015),
    pop2022 = round(pop2021 * 1.015),
    pop2023 = round(pop2022 * 1.015)
  )

# check results
sl_pop_2020_2023
Output
          adm0    adm1     adm2        adm3 pop2020 pop2021 pop2022 pop2023
1 SIERRA LEONE EASTERN KAILAHUN         DEA   20003   20303   20608   20917
2 SIERRA LEONE EASTERN KAILAHUN        JAHN   10116   10268   10422   10578
3 SIERRA LEONE EASTERN KAILAHUN       JAWIE   76665   77815   78982   80167
4 SIERRA LEONE EASTERN KAILAHUN  KISSI KAMA   26207   26600   26999   27404
5 SIERRA LEONE EASTERN KAILAHUN  KISSI TENG   44124   44786   45458   46140
6 SIERRA LEONE EASTERN KAILAHUN KISSI TONGI   62577   63516   64469   65436

To adapt the code:

  • Line 3: Replace shp_adm3 with your shapefile object, if named differently.
  • Line 8-10: Add more rows following the same pattern if you need to extend projections beyond 2023 (e.g., add pop2024, pop2025, etc.). Update the year labels if you are projecting for different years or starting from a different baseline year.
  • Line 8-10: Update the growth rate (1.015) if your context requires a different annual population growth rate.

Once updated, run the code to interpolate population data.

Step 4.2: Reshape to long format

With interpolated values prepared in wide format, we now convert the dataset back to long format. This ensures each row corresponds to one admin unit for one year—making it easier to merge with other datasets, generate indicators, or plot trends over time.

  • R
  • Python
# turn wide pop to long
sl_pop_long <- sl_pop_2020_2023 |>
  tidyr::pivot_longer(
    cols = dplyr::starts_with("pop"),
    names_to = "year",
    values_to = "pop"
  ) |>
  dplyr::mutate(
    year = stringr::str_remove(year, "pop") |>
      as.integer()
  )

To adapt the code:

  • Line 3: If you renamed the columns differently (not starting with pop), update dplyr::starts_with("pop") to match your actual prefix.
  • Line 8-10: If you used a different naming pattern (e.g., population2020), adjust stringr::str_remove("pop") to match that pattern (e.g., stringr::str_remove("population")).

Once updated, run the code to reshape your interpolated population data.

Step 5: Visualizing Raster-Extracted Population Data

After extracting and projecting population estimates from the raster, it’s helpful to visualize the results. This lets us confirm that spatial joins worked correctly and check for anomalies—such as missing areas, unrealistic values, or gaps in coverage—before moving forward.

Step 5.1: Join and bin population data

We start by joining the long-format population dataset to the corresponding shapefile geometry. We also group population values into bins to simplify interpretation across the map. Grouping helps highlight population ranges and quickly spot unusually high or low values. To support quality checks, we group population values into bins. We also include a “Missing or 0” category to catch any admin units where the value is either absent or zero. This category is rendered in grey on the map, allowing us to quickly spot potential gaps in the raster extraction or projection steps.

  • R
  • Python
# join extracted raster pop to shapefile
shp_adm3_raster_pop <- sl_pop_long |>
  dplyr::left_join(
    # join shapefile to pop data
    dplyr::select(shp_adm3, -pop, -year),
    by = c("adm0", "adm1", "adm2", "adm3")
  ) |> sf::st_as_sf() |>
  # create categorical version of pop column
  dplyr::mutate(
    pop_bin = dplyr::case_when(
      is.na(pop) | pop == 0 ~ "Missing or 0",
      pop <= 15000 ~ "<15k",
      pop <= 30000 ~ "15k–30k",
      pop <= 50000 ~ "30k–50k",
      pop <= 100000 ~ "50k–100k",
      pop > 100000 ~ "100k+"
    ),
    pop_bin = factor(
      pop_bin,
      levels = c("Missing or 0", "<15k",
                 "15k–30k", "30k–50k",
                 "50k–100k", "100k+")
    )
  )

To adapt the code:

  • Line 6: If you used a different shapefile or admin join level earlier, update by = "adm3 to the level you used (e.g., adm2 or adm1).
  • Lines 11-16: You may adjust the population bins to match meaningful ranges for your context.

Once updated, run the code to set-up your population data for mapping.

Step 5.2: Generate map and check results

We use ggplot2 to create a faceted map, showing population by chiefdom (adm3) from 2020 to 2023. Overlaying adm1 boundaries improves spatial reference. If any chiefdoms have missing or zero values, they’ll appear clearly—making this step a valuable quality control check.

  • R
  • Python
# plot extracted raster population

# generate adm0-level shapefile for plotting
shp_adm0 <- shp_adm3 |>
  dplyr::group_by(adm0) |>
  dplyr::summarise(.groups = "drop") |>
  sf::st_union(by_feature = TRUE)

# generate adm1-level shapefile for plotting
shp_adm1 <- shp_adm3 |>
  dplyr::group_by(adm1) |>
  dplyr::summarise(.groups = "drop") |>
  sf::st_union(by_feature = TRUE)

adm3_raster_pop_map <- shp_adm3_raster_pop |>
  ggplot2::ggplot() +
  ggplot2::geom_sf(
    ggplot2::aes(fill = pop_bin),
    color = "white",
    size = 0.2
  ) +
  # add adm1 outline
  ggplot2::geom_sf(
    data = shp_adm1,
    fill = NA, color = "black", size = 0.5
  ) +
  ggplot2::scale_fill_manual(
    name = "Population Range",
    values = c("Missing or 0" = "grey80",
               "<15k" = "#fee5d9",
               "15k–30k" = "#fcae91",
               "30k–50k" = "#fb6a4a",
               "50k–100k" = "#de2d26",
               "100k+" = "#a50f15"),
    na.value = "grey80",
    guide = ggplot2::guide_legend(
      label.position = "bottom",
      title.position = "top"
    )
  ) +
    ggplot2::facet_wrap(~year, nrow = 1) +
  ggplot2::labs(
    title = "Raster-Extracted Population by Chiefdom (adm3) for 2020 - 2023",
    subtitle = "Extracted from WorldPop Raster Data (2020)",
    caption = "Note: Population projected to 2021–2023 using 1.5% growth rate.",
    fill = "Population"
  ) +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    legend.position = "bottom",
    legend.title.position = "top",
    legend.key.width = grid::unit(1, "cm"),
    axis.text.x = ggplot2::element_text(
      angle = 75, vjust = 1, hjust = 1
    )
  )
Output

To adapt the code:

  • Line 14 and 42: If you joined at a different admin level (e.g., adm2), update shp_adm3_raster_pop and the map title accordingly.
  • Line 23: If your adm1 shapefile is named differently, update data = shp_adm1 to your object name.
  • Line 28-33: Adjust color values and ranges if you defined different population bins.

Once updated, run the code to map your aggregated data.

Step 5.3: Save plots

The map is then saved as a high-resolution PNG for future reference.

  • R
  • Python
# save plot
ggplot2::ggsave(
  plot = adm3_raster_pop_map,
  here::here("03_output/3a_figures/pop_rast_check_adm3_2020_2023.png"),
  width = 12, height = 5, dpi = 300)

To adapt the code:

  • Line 4: Update the filename path to match your folder structure. You can change pop_rast_check_adm3_2020_2023.png.
  • Line 5: Adjust the width, height, and dpi if you need a different image size or quality.

Once updated, run the code to save the plot as a PNG file.

Step 6: Other Uses of Raster-Based Population Data

While population rasters are commonly used to generate subnational totals, they can also support other spatial analyses that feed into SNT decision-making. This section outlines how WorldPop can be used to complement census data for subnational tailoring tasks, especially when finer geographic resolution or relative population patterns are needed.

While modeled rasters like WorldPop may not perfectly match census totals, they often capture relative population distribution well, particularly across subdistricts or within large, heterogeneous districts. This makes them useful for tasks such as weighting, proportional allocation, and assessing spatial access.

Step 6.1: Calculate proportion of population per district

One common use of WorldPop is to calculate the proportion of the national population represented by each unit in our shapefile. In this case, we work at the adm3 level and compute each unit’s share of the total national population, as well as its share within its adm1 and adm2 boundaries. This is useful when national population totals are available, but subnational breakdowns are outdated or unreliable. It also allows us to allocate resources proportionally across areas, generate relative weights for further analysis, or compare population distribution across spatial hierarchies. This approach treats WorldPop raster estimates not as a substitute for census counts, but as a way to infer consistent spatial proportions when more granular data is missing.

In the code below, we calculate total population at the national, adm1, and adm2 levels, and then compute the share of each adm3 unit relative to each of those three totals.

  • R
  • Python
# Calculate pop shares at national, adm1, and adm2 levels
pop_share <- shp_adm3 |>
  sf::st_drop_geometry() |>
  dplyr::mutate(
    adm0_total = sum(pop, na.rm = TRUE)
  ) |>
  dplyr::group_by(adm1) |>
  dplyr::mutate(
    total_adm1 = sum(pop, na.rm = TRUE)
  ) |>
  dplyr::group_by(adm1, adm2, .add = TRUE) |>
  dplyr::mutate(
    total_adm2 = sum(pop, na.rm = TRUE),
    prop_adm0 = pop / adm0_total,
    prop_adm1 = pop / total_adm1,
    prop_adm2 = pop / total_adm2
  ) |>
  dplyr::ungroup()

# Check head
pop_share |>
  dplyr::filter(
    adm2 == "WESTERN URBAN"
    ) |>
  dplyr::select(
    adm1, adm2, adm3, year, pop,
    prop_adm0, prop_adm1, prop_adm2
  ) |>
    dplyr::arrange(desc(prop_adm0)) |>
  head()
Output
# A tibble: 6 × 8
  adm1    adm2          adm3       year    pop prop_adm0 prop_adm1 prop_adm2
  <chr>   <chr>         <chr>     <dbl>  <dbl>     <dbl>     <dbl>     <dbl>
1 WESTERN WESTERN URBAN EAST III   2020 614851   0.0664     0.315     0.415 
2 WESTERN WESTERN URBAN WEST III   2020 481525   0.0520     0.247     0.325 
3 WESTERN WESTERN URBAN WEST II    2020 153241   0.0165     0.0785    0.103 
4 WESTERN WESTERN URBAN WEST I     2020  64695   0.00699    0.0331    0.0436
5 WESTERN WESTERN URBAN CENTRAL I  2020  55202   0.00596    0.0283    0.0372
6 WESTERN WESTERN URBAN EAST I     2020  49774   0.00538    0.0255    0.0336

To adapt the code:

  • Lines 4, 6, 10: Replace adm1, adm2, and pop with the appropriate column names from your shapefile.
  • Line 1: Ensure your shapefile includes the population variable extracted from WorldPop for each subunit.
  • Lines 6, 10: Adjust grouping if working at a different level (e.g., adm1 → adm0).

Once updated, run the code to generate proportional population shares for each subunit, which can then be used to disaggregate totals or guide targeting.

This output confirms that the calculation worked: in 2020, the EAST III and WEST III subunits together accounted for 74% of Western Urban’s population and over 11% of the national population. This shows how population is concentrated within just a few adm3 units in a major urban district.

These proportions can support future use cases, such as weighted analysis or resource allocation, and can be translated into absolute counts using official census totals.

Step 6.2: Normalize population per pixel within each admin

When the goal is to understand how people are distributed within an a given admin such as district, not how many people live there overall, we use a normalized population surface. Each pixel is rescaled to reflect its share of the total population in its admin level, making it easier to calculate things like: what proportion of the district lives within 5 km of a facility?

Why Normalize?

Normalizing the population raster by district helps account for where people actually live within each unit. This is especially useful when combining rasters (e.g., population with access surfaces) or estimating the share of a population affected by a spatial feature like a health facility buffer.

Whether we use raw or normalized population depends on our objective:

  • Raw population gives the number of people per pixel.
  • Normalized population gives each pixel’s share of the district total (adds to 1 within each unit).

Use raw counts when we need absolute numbers. Use normalized values for within-district proportions or spatial weighting.

Our goal Use raw pop? Normalize?
Count people within a buffer ✅ Yes ❌ No
Visualize where most people live ✅ Yes ⚠️ Maybe – use if focusing on relative density
Disaggregate a district total (e.g., 500 cases) ❌ No ✅ Yes
Weight raster values within a district ❌ No ✅ Yes

We normalize the 2020 WorldPop raster so that pixel values within each district sum to 1 using sntutils::normalize_raster_by_polygon(). The result is a raster where each pixel represents its share of the population within the admin unit of interest. This relative surface is useful for later steps involving spatial weighting or proportional disaggregation, for example, to obtain a district-level estimate of mean Plasmodium falciparum parasite prevalence in children aged 2-10 years (PfPR2-10) using a geospatial modeled surface of (PfPR2-10).

  • R
  • Python
# Normalize pop per pixel within each adm2
normed_wp_raster <- sntutils::normalize_raster_by_polygon(
  raster = sle_2020_rast,
  shp = shp_adm3,
  id_col = "adm2"
)

To adapt the code:

  • Line 3: Change shp_adm3 to your own shapefile if using a different geography.
  • Line 4: Replace adm2 with the name of the column that identifies your target admin unit.
  • Line 2: Ensure sle_2020_rast is the population raster you want to normalize.

Once updated, run the code to generate a within-unit normalized raster, where values sum to 1 per admin area.

Step 6.3: Compare original and normalized population rasters

Now that we’ve created a normalized version of the 2020 WorldPop raster, we can visually compare it to the original raster. This helps illustrate how normalization emphasizes relative distribution rather than absolute population counts.

  • R
  • Python
Show full code
# Convert rasters to data.frames
df_orig <- as.data.frame(sle_2020_rast, xy = TRUE, na.rm = T)
df_norm <- as.data.frame(normed_wp_raster, xy = TRUE, na.rm = T)

# Original population raster
p1 <- ggplot2::ggplot(
  df_orig,
  ggplot2::aes(x, y, fill = sle_pd_2020_1km_UNadj)
) +
  ggplot2::geom_raster() +
  ggplot2::coord_quickmap() +
  ggplot2::labs(
    title = "Original Raster (adm2)",
    fill = "Population"
  ) +
  ggplot2::scale_fill_distiller(
    palette = "OrRd",
    direction = 1,
    labels = scales::label_comma(big.mark = ","),
    guide = ggplot2::guide_colorbar(
      title.position = "top",
      title.hjust = 0.5,
      barheight = grid::unit(0.2, "cm"),
      barwidth = grid::unit(5.5, "cm")
    )
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "bottom",
    plot.title = ggplot2::element_text(hjust = 0.5),
    legend.title.align = 0.5
  )

# Normalized population raster
p2 <- ggplot2::ggplot(
  df_norm,
  ggplot2::aes(x, y, fill = sle_pd_2020_1km_UNadj * 100)
) +
  ggplot2::geom_raster() +
  ggplot2::coord_quickmap() +
  ggplot2::labs(
    title = "Normalized Raster (adm2)",
    fill = "Population (%)"
  ) +
  ggplot2::scale_fill_distiller(
    palette = "OrRd",
    direction = 1,
    labels = scales::label_number(suffix = "%", accuracy = 0.1),
    guide = ggplot2::guide_colorbar(
      title.position = "top",
      title.hjust = 0.5,
      barheight = grid::unit(0.2, "cm"),
      barwidth = grid::unit(5.5, "cm")
    )
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "bottom",
    plot.title = ggplot2::element_text(hjust = 0.5),
    legend.title.align = 0.5
  )

# assemble plots
(p1 + p2) & ggplot2::theme(legend.position = "bottom")
Output

To adapt the code:

  • Lines 2, 3: Replace sle_2020_rast with your original raster and normed_wp_raster with your normalized one.
  • Lines 8, 37: Update the fill variable if your raster uses a different layer name.
  • Lines 13, 42: Adjust adm2 or any labels to reflect your admin level.
  • This comparison works best when both rasters are aligned spatially and temporally.

Once updated, run the code to generate your comparison plots.

Step 7: Save Data

After aggregating and validating the population data, it’s good practice to save the final population data. This allows us to reuse the cleaned and structured data in later steps without needing to reprocess from scratch.

We save the raw raster (.tif) and the aggregated population table as .rds, .csv, and .xlsx formats.

  • R
  • Python
# set up output path
save_path <- here::here(
  "01_data",
  "1.1_foundational",
  "1.1c_population",
  "1ci_worldpop_rasters"
)

# save the raw raster
terra::writeRaster(
  sle_2020_rast,
  here::here(save_path, "1ci_worldpop_rasters", "sle_2020_pop_rast.tif")
)

# save the normalised raw raster
terra::writeRaster(
  normed_wp_raster,
  here::here(save_path, "1ci_worldpop_rasters", "sle_2020_pop_rast_normed.tif")
)

# save to rds
rio::export(
  pop_share,
  here::here(save_path, "1ci_population", "sle_2020_pop_rast.rds")
)

# save to csv
rio::export(
  pop_share,
  here::here(save_path, "1ci_population", "sle_2020_pop_rast.csv")
)

# save to xlsx
rio::export(
  pop_share,
  here::here(save_path, "1ci_population", "sle_2020_pop_rast.xlsx")
)

To adapt the code:

  • Line 3: Update save_path to your output directory. pop_long: Replace if your rainfall column has a different name.
  • File Names: Change file names (e.g., sle_2020_pop_rast.csv) to reflect your dataset or time range.

Once updated, run the code to save your outputs in both raw and processed formats.

Summary

We’ve now walked through how to work with modeled population rasters like WorldPop—from importing .tif files, extracting values using shapefiles, to projecting totals for recent years. These layers are particularly helpful when census data is missing or too coarse, and give us flexible, high-resolution population estimates that can be aligned with any spatial unit. We’ve also touched on good practices: harmonizing projections, validating joins, and saving cleaned outputs in reusable formats.

We also explored additional uses, calculating population shares across districts and normalizing rasters within units to support spatial weighting and disaggregation. These methods treat the raster as a distribution surface, not just a count layer. The full code is available at the end of this section (collapsed for convenience). We can use it as a template—just update paths, or admin levels to fit our context.

Full code

Find the full code script for working with WorldPop population rasters below.

  • R
  • Python
Show full code
################################################################################
################### ~ WorldPop population raster full code ~ ###################
################################################################################

### Step 1: Import Raster ------------------------------------------------------

# Install or load required packages
pacman::p_load(
  httr,           # API access (e.g., MAP, WorldPop)
  sf,             # vector data (shapefiles)
  scales,         # label formatting
  patchwork,      # combine ggplots
  terra,          # raster handling
  exactextractr   # zonal stats from rasters
)

# Define direct link to 2020 Sierra Leone population raster
sle_pop_path <- paste0(
  "https://data.worldpop.org/GIS/Population_Density/",
  "Global_2000_2020_1km_UNadj/2020/SLE/sle_pd_2020_1km_UNadj.tif"
)

# Import raster
sle_2020_rast <- terra::rast(sle_pop_path)

### Step 2: Extract Population Values by Admin Unit ----------------------------

#### Step 2.1: Get shapefile and set CRS ---------------------------------------

# Get shapefile
shp_adm3 <- sf::read_sf(
  here::here(
    "1.1_foundational",
    "1.1a_administrative_boundaries",
    "Chiefdom2021.shp"
  )
) |>
  dplyr::mutate(adm0 = "SIERRA LEONE") |>
  dplyr::select(
    adm0,
    adm1 = FIRST_REGI,
    adm2 = FIRST_DNAM,
    adm3 = FIRST_CHIE
  )

# Ensure both layers use same CRS
shp_adm3 <- sf::st_transform(shp_adm3, crs = terra::crs(sle_2020_rast))

### Step 2: Extract Population Values by Admin Unit ----------------------------

#### Step 2.1: Get shapefile and set CRS ---------------------------------------

terra::crs(raster_file)
sf::st_crs(shapefile)

### Step 2: Extract Population Values by Admin Unit ----------------------------

#### Step 2.1: Get shapefile and set CRS ---------------------------------------

shapefile <- sf::st_transform(shapefile, crs = terra::crs(raster_file))

### Step 2: Extract Population Values by Admin Unit ----------------------------

#### Step 2.2: Extract population from raster ----------------------------------

# Use exactextractr to extract weighted sums
shp_adm3$pop <- exactextractr::exact_extract(
  sle_2020_rast, shp_adm3, 'sum',
  progress = FALSE # we turn off progress bar
) |> round()

# Preview result
shp_adm3 |>
  dplyr::select(adm0, adm1, adm2, adm3, year, pop) |> head()

### Step 3: Aggregating Population Data ----------------------------------------

# summarize total population by adm0
pop_summary_adm2 <- shp_adm3 |>
  sf::st_drop_geometry() |>
  dplyr::group_by(adm1, adm2, year) |>
  dplyr::reframe(pop_total = sum(pop, na.rm = TRUE))

pop_summary_adm2

### Step 4: Extend Population Estimates Beyond 2020 ----------------------------

#### Step 4.1: Prepare for interpolation ---------------------------------------

# apply population growth rate for
# the next three years
sl_pop_2020_2023 <- shp_adm3 |>
  sf::st_drop_geometry() |>
  dplyr::select(-year) |>
  dplyr::rename(pop2020 = pop) |>
  dplyr::mutate(
    pop2021 = round(pop2020 * 1.015),
    pop2022 = round(pop2021 * 1.015),
    pop2023 = round(pop2022 * 1.015)
  )

# check results
sl_pop_2020_2023

### Step 4: Extend Population Estimates Beyond 2020 ----------------------------

#### Step 4.2: Reshape to long format ------------------------------------------

# turn wide pop to long
sl_pop_long <- sl_pop_2020_2023 |>
  tidyr::pivot_longer(
    cols = dplyr::starts_with("pop"),
    names_to = "year",
    values_to = "pop"
  ) |>
  dplyr::mutate(
    year = stringr::str_remove(year, "pop") |>
      as.integer()
  )

### Step 5: Visualizing Raster-Extracted Population Data -----------------------

#### Step 5.1: Join and bin population data ------------------------------------

# join extracted raster pop to shapefile
shp_adm3_raster_pop <- sl_pop_long |>
  dplyr::left_join(
    # join shapefile to pop data
    dplyr::select(shp_adm3, -pop, -year),
    by = c("adm0", "adm1", "adm2", "adm3")
  ) |> sf::st_as_sf() |>
  # create categorical version of pop column
  dplyr::mutate(
    pop_bin = dplyr::case_when(
      is.na(pop) | pop == 0 ~ "Missing or 0",
      pop <= 15000 ~ "<15k",
      pop <= 30000 ~ "15k–30k",
      pop <= 50000 ~ "30k–50k",
      pop <= 100000 ~ "50k–100k",
      pop > 100000 ~ "100k+"
    ),
    pop_bin = factor(
      pop_bin,
      levels = c("Missing or 0", "<15k",
                 "15k–30k", "30k–50k",
                 "50k–100k", "100k+")
    )
  )

### Step 5: Visualizing Raster-Extracted Population Data -----------------------

#### Step 5.2: Generate map and check results ----------------------------------

# plot extracted raster population

# generate adm0-level shapefile for plotting
shp_adm0 <- shp_adm3 |>
  dplyr::group_by(adm0) |>
  dplyr::summarise(.groups = "drop") |>
  sf::st_union(by_feature = TRUE)

# generate adm1-level shapefile for plotting
shp_adm1 <- shp_adm3 |>
  dplyr::group_by(adm1) |>
  dplyr::summarise(.groups = "drop") |>
  sf::st_union(by_feature = TRUE)

adm3_raster_pop_map <- shp_adm3_raster_pop |>
  ggplot2::ggplot() +
  ggplot2::geom_sf(
    ggplot2::aes(fill = pop_bin),
    color = "white",
    size = 0.2
  ) +
  # add adm1 outline
  ggplot2::geom_sf(
    data = shp_adm1,
    fill = NA, color = "black", size = 0.5
  ) +
  ggplot2::scale_fill_manual(
    name = "Population Range",
    values = c("Missing or 0" = "grey80",
               "<15k" = "#fee5d9",
               "15k–30k" = "#fcae91",
               "30k–50k" = "#fb6a4a",
               "50k–100k" = "#de2d26",
               "100k+" = "#a50f15"),
    na.value = "grey80",
    guide = ggplot2::guide_legend(
      label.position = "bottom",
      title.position = "top"
    )
  ) +
    ggplot2::facet_wrap(~year, nrow = 1) +
  ggplot2::labs(
    title = "Raster-Extracted Population by Chiefdom (adm3) for 2020 - 2023",
    subtitle = "Extracted from WorldPop Raster Data (2020)",
    caption = "Note: Population projected to 2021–2023 using 1.5% growth rate.",
    fill = "Population"
  ) +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    legend.position = "bottom",
    legend.title.position = "top",
    legend.key.width = grid::unit(1, "cm"),
    axis.text.x = ggplot2::element_text(
      angle = 75, vjust = 1, hjust = 1
    )
  )

### Step 6: Other Uses of Raster-Based Population Data -------------------------

#### Step 6.1: Calculate proportion of population per district -----------------

# Calculate pop shares at national, adm1, and adm2 levels
pop_share <- shp_adm3 |>
  sf::st_drop_geometry() |>
  dplyr::mutate(
    adm0_total = sum(pop, na.rm = TRUE)
  ) |>
  dplyr::group_by(adm1) |>
  dplyr::mutate(
    total_adm1 = sum(pop, na.rm = TRUE)
  ) |>
  dplyr::group_by(adm1, adm2, .add = TRUE) |>
  dplyr::mutate(
    total_adm2 = sum(pop, na.rm = TRUE),
    prop_adm0 = pop / adm0_total,
    prop_adm1 = pop / total_adm1,
    prop_adm2 = pop / total_adm2
  ) |>
  dplyr::ungroup()

# Check head
pop_share |>
  dplyr::filter(
    adm2 == "WESTERN URBAN"
    ) |>
  dplyr::select(
    adm1, adm2, adm3, year, pop,
    prop_adm0, prop_adm1, prop_adm2
  ) |>
    dplyr::arrange(desc(prop_adm0)) |>
  head()

### Step 6: Other Uses of Raster-Based Population Data -------------------------

#### Step 6.2: Normalize population per pixel within each admin ----------------

# Normalize pop per pixel within each adm2
normed_wp_raster <- sntutils::normalize_raster_by_polygon(
  raster = sle_2020_rast,
  shp = shp_adm3,
  id_col = "adm2"
)

### Step 6: Other Uses of Raster-Based Population Data -------------------------

#### Step 6.3: Compare original and normalized population rasters --------------

# Convert rasters to data.frames
df_orig <- as.data.frame(sle_2020_rast, xy = TRUE, na.rm = T)
df_norm <- as.data.frame(normed_wp_raster, xy = TRUE, na.rm = T)

# Original population raster
p1 <- ggplot2::ggplot(
  df_orig,
  ggplot2::aes(x, y, fill = sle_pd_2020_1km_UNadj)
) +
  ggplot2::geom_raster() +
  ggplot2::coord_quickmap() +
  ggplot2::labs(
    title = "Original Raster (adm2)",
    fill = "Population"
  ) +
  ggplot2::scale_fill_distiller(
    palette = "OrRd",
    direction = 1,
    labels = scales::label_comma(big.mark = ","),
    guide = ggplot2::guide_colorbar(
      title.position = "top",
      title.hjust = 0.5,
      barheight = grid::unit(0.2, "cm"),
      barwidth = grid::unit(5.5, "cm")
    )
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "bottom",
    plot.title = ggplot2::element_text(hjust = 0.5),
    legend.title.align = 0.5
  )

# Normalized population raster
p2 <- ggplot2::ggplot(
  df_norm,
  ggplot2::aes(x, y, fill = sle_pd_2020_1km_UNadj * 100)
) +
  ggplot2::geom_raster() +
  ggplot2::coord_quickmap() +
  ggplot2::labs(
    title = "Normalized Raster (adm2)",
    fill = "Population (%)"
  ) +
  ggplot2::scale_fill_distiller(
    palette = "OrRd",
    direction = 1,
    labels = scales::label_number(suffix = "%", accuracy = 0.1),
    guide = ggplot2::guide_colorbar(
      title.position = "top",
      title.hjust = 0.5,
      barheight = grid::unit(0.2, "cm"),
      barwidth = grid::unit(5.5, "cm")
    )
  ) +
  ggplot2::theme_void() +
  ggplot2::theme(
    legend.position = "bottom",
    plot.title = ggplot2::element_text(hjust = 0.5),
    legend.title.align = 0.5
  )

# assemble plots
(p1 + p2) & ggplot2::theme(legend.position = "bottom")

### Step 7: Save Data ----------------------------------------------------------

# set up output path
save_path <- here::here(
  "01_data",
  "1.1_foundational",
  "1.1c_population",
  "1ci_worldpop_rasters"
)

# save the raw raster
terra::writeRaster(
  sle_2020_rast,
  here::here(save_path, "1ci_worldpop_rasters", "sle_2020_pop_rast.tif")
)

# save the normalised raw raster
terra::writeRaster(
  normed_wp_raster,
  here::here(save_path, "1ci_worldpop_rasters", "sle_2020_pop_rast_normed.tif")
)

# save to rds
rio::export(
  pop_share,
  here::here(save_path, "1ci_population", "sle_2020_pop_rast.rds")
)

# save to csv
rio::export(
  pop_share,
  here::here(save_path, "1ci_population", "sle_2020_pop_rast.csv")
)

# save to xlsx
rio::export(
  pop_share,
  here::here(save_path, "1ci_population", "sle_2020_pop_rast.xlsx")
)

### Step 7: Save Data ----------------------------------------------------------

# save as RDS
rio::export(
  pop_share,
  here::here("english/data_r/pop/final_pop_rast_2020_2023.rds")
)
 

©2025 Applied Health Analytics for Delivery and Innovation. All rights reserved