Fills missing population values for specified years by applying multipliers to the nearest available year within each location group. Can handle multiple population columns simultaneously. Supports extending forward beyond the latest data or backward before the earliest data. Automatically calculates growth rates from existing data when multipliers are not provided.
Arguments
- data
A data frame containing population data with year, multiple population columns, and location columns.
- year_col
The name of the year column (unquoted or character).
- pop_cols
A character vector of population column names to extrapolate.
- group_cols
A character vector of grouping column names defining location.
- years_to_extrap
A vector of target years to extrapolate. Can be unnamed (e.g., c(2021, 2022)) or named with specific multipliers (e.g., c(
2021= 1.5,2022= 1.3)).- multiplier
A single numeric multiplier to apply to all years when
years_to_extrapis unnamed (e.g., 1.5). Can also be a named list/vector with multipliers for each population column. For year-specific multipliers, use a nested list structure likelist(pop_total = c('2021' = 1.03, '2022' = 1.025)). If NULL and sufficient data exists, growth rates will be calculated automatically.
Value
A data frame with updated population estimates for all specified population columns and years.
Examples
# Dummy data for 3 districts over 3 years with multiple population columns
dummy_data <- expand.grid(
adm0 = "COUNTRYX",
adm1 = c("RegionA", "RegionB"),
adm2 = c("District1", "District2"),
year = 2018:2020
) |>
dplyr::mutate(
adm3 = paste0(adm2, "_Subarea"),
pop_total = sample(1000:5000, size = dplyr::n(), replace = TRUE),
pop_0_11m = pop_total * 0.08,
pop_0_4y = pop_total * 0.15,
pop_u15 = pop_total * 0.45
) |>
dplyr::arrange(adm0, adm1, adm2, year)
# Example with automatic growth rate calculation (no multiplier provided)
extrapolate_pop(
data = dummy_data,
year_col = "year",
pop_cols = c("pop_total", "pop_0_11m", "pop_0_4y", "pop_u15"),
group_cols = c("adm0", "adm1", "adm2", "adm3"),
years_to_extrap = c(2021, 2022)
)
#> # A tibble: 20 × 9
#> adm0 adm1 adm2 adm3 year pop_total pop_0_11m pop_0_4y pop_u15
#> <fct> <fct> <fct> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 COUNTRYX RegionA District1 Distri… 2018 3456 276. 518. 1555.
#> 2 COUNTRYX RegionA District1 Distri… 2019 3975 318 596. 1789.
#> 3 COUNTRYX RegionA District1 Distri… 2020 3338 267. 501. 1502.
#> 4 COUNTRYX RegionA District1 Distri… 2021 3866 309 580 1740
#> 5 COUNTRYX RegionA District1 Distri… 2022 4477 358 672 2015
#> 6 COUNTRYX RegionA District2 Distri… 2018 4349 348. 652. 1957.
#> 7 COUNTRYX RegionA District2 Distri… 2019 3584 287. 538. 1613.
#> 8 COUNTRYX RegionA District2 Distri… 2020 4951 396. 743. 2228.
#> 9 COUNTRYX RegionA District2 Distri… 2021 5734 459 860 2580
#> 10 COUNTRYX RegionA District2 Distri… 2022 6641 532 996 2988
#> 11 COUNTRYX RegionB District1 Distri… 2018 2331 186. 350. 1049.
#> 12 COUNTRYX RegionB District1 Distri… 2019 2005 160. 301. 902.
#> 13 COUNTRYX RegionB District1 Distri… 2020 2447 196. 367. 1101.
#> 14 COUNTRYX RegionB District1 Distri… 2021 2834 227 425 1275
#> 15 COUNTRYX RegionB District1 Distri… 2022 3282 263 492 1477
#> 16 COUNTRYX RegionB District2 Distri… 2018 2112 169. 317. 950.
#> 17 COUNTRYX RegionB District2 Distri… 2019 4030 322. 604. 1814.
#> 18 COUNTRYX RegionB District2 Distri… 2020 4357 349. 654. 1961.
#> 19 COUNTRYX RegionB District2 Distri… 2021 5046 404 757 2271
#> 20 COUNTRYX RegionB District2 Distri… 2022 5844 468 877 2630
# Example with same multiplier for all columns
extrapolate_pop(
data = dummy_data,
year_col = "year",
pop_cols = c("pop_total", "pop_0_11m", "pop_0_4y", "pop_u15"),
group_cols = c("adm0", "adm1", "adm2", "adm3"),
years_to_extrap = c(2021, 2022),
multiplier = 1.03
)
#> # A tibble: 20 × 9
#> adm0 adm1 adm2 adm3 year pop_total pop_0_11m pop_0_4y pop_u15
#> <fct> <fct> <fct> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 COUNTRYX RegionA District1 Distri… 2018 3456 276. 518. 1555.
#> 2 COUNTRYX RegionA District1 Distri… 2019 3975 318 596. 1789.
#> 3 COUNTRYX RegionA District1 Distri… 2020 3338 267. 501. 1502.
#> 4 COUNTRYX RegionA District1 Distri… 2021 3438 275 516 1547
#> 5 COUNTRYX RegionA District1 Distri… 2022 3541 283 531 1593
#> 6 COUNTRYX RegionA District2 Distri… 2018 4349 348. 652. 1957.
#> 7 COUNTRYX RegionA District2 Distri… 2019 3584 287. 538. 1613.
#> 8 COUNTRYX RegionA District2 Distri… 2020 4951 396. 743. 2228.
#> 9 COUNTRYX RegionA District2 Distri… 2021 5100 408 765 2295
#> 10 COUNTRYX RegionA District2 Distri… 2022 5253 420 788 2364
#> 11 COUNTRYX RegionB District1 Distri… 2018 2331 186. 350. 1049.
#> 12 COUNTRYX RegionB District1 Distri… 2019 2005 160. 301. 902.
#> 13 COUNTRYX RegionB District1 Distri… 2020 2447 196. 367. 1101.
#> 14 COUNTRYX RegionB District1 Distri… 2021 2520 202 378 1134
#> 15 COUNTRYX RegionB District1 Distri… 2022 2596 208 389 1168
#> 16 COUNTRYX RegionB District2 Distri… 2018 2112 169. 317. 950.
#> 17 COUNTRYX RegionB District2 Distri… 2019 4030 322. 604. 1814.
#> 18 COUNTRYX RegionB District2 Distri… 2020 4357 349. 654. 1961.
#> 19 COUNTRYX RegionB District2 Distri… 2021 4488 359 673 2019
#> 20 COUNTRYX RegionB District2 Distri… 2022 4623 370 693 2080
# Example with different multipliers for each column
extrapolate_pop(
data = dummy_data,
year_col = "year",
pop_cols = c("pop_total", "pop_0_11m", "pop_0_4y", "pop_u15"),
group_cols = c("adm0", "adm1", "adm2", "adm3"),
years_to_extrap = c(2021, 2022),
multiplier = list(
pop_total = 1.025,
pop_0_11m = 1.030,
pop_0_4y = 1.028,
pop_u15 = 1.020
)
)
#> # A tibble: 20 × 9
#> adm0 adm1 adm2 adm3 year pop_total pop_0_11m pop_0_4y pop_u15
#> <fct> <fct> <fct> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 COUNTRYX RegionA District1 Distri… 2018 3456 276. 518. 1555.
#> 2 COUNTRYX RegionA District1 Distri… 2019 3975 318 596. 1789.
#> 3 COUNTRYX RegionA District1 Distri… 2020 3338 267. 501. 1502.
#> 4 COUNTRYX RegionA District1 Distri… 2021 3421 275 515 1532
#> 5 COUNTRYX RegionA District1 Distri… 2022 3507 283 529 1563
#> 6 COUNTRYX RegionA District2 Distri… 2018 4349 348. 652. 1957.
#> 7 COUNTRYX RegionA District2 Distri… 2019 3584 287. 538. 1613.
#> 8 COUNTRYX RegionA District2 Distri… 2020 4951 396. 743. 2228.
#> 9 COUNTRYX RegionA District2 Distri… 2021 5075 408 763 2273
#> 10 COUNTRYX RegionA District2 Distri… 2022 5202 420 784 2318
#> 11 COUNTRYX RegionB District1 Distri… 2018 2331 186. 350. 1049.
#> 12 COUNTRYX RegionB District1 Distri… 2019 2005 160. 301. 902.
#> 13 COUNTRYX RegionB District1 Distri… 2020 2447 196. 367. 1101.
#> 14 COUNTRYX RegionB District1 Distri… 2021 2508 202 377 1123
#> 15 COUNTRYX RegionB District1 Distri… 2022 2571 208 388 1145
#> 16 COUNTRYX RegionB District2 Distri… 2018 2112 169. 317. 950.
#> 17 COUNTRYX RegionB District2 Distri… 2019 4030 322. 604. 1814.
#> 18 COUNTRYX RegionB District2 Distri… 2020 4357 349. 654. 1961.
#> 19 COUNTRYX RegionB District2 Distri… 2021 4466 359 672 2000
#> 20 COUNTRYX RegionB District2 Distri… 2022 4578 370 691 2040
# Example with year-specific multipliers for each column
extrapolate_pop(
data = dummy_data,
year_col = "year",
pop_cols = c("pop_total", "pop_0_11m"),
group_cols = c("adm0", "adm1", "adm2", "adm3"),
years_to_extrap = c(2021, 2022),
multiplier = list(
pop_total = c(`2021` = 1.03, `2022` = 1.025),
pop_0_11m = c(`2021` = 1.035, `2022` = 1.030)
)
)
#> # A tibble: 20 × 9
#> adm0 adm1 adm2 adm3 year pop_total pop_0_11m pop_0_4y pop_u15
#> <fct> <fct> <fct> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 COUNTRYX RegionA District1 Distri… 2018 3456 276. 518. 1555.
#> 2 COUNTRYX RegionA District1 Distri… 2019 3975 318 596. 1789.
#> 3 COUNTRYX RegionA District1 Distri… 2020 3338 267. 501. 1502.
#> 4 COUNTRYX RegionA District1 Distri… 2021 3438 276 NA NA
#> 5 COUNTRYX RegionA District1 Distri… 2022 3524 284 NA NA
#> 6 COUNTRYX RegionA District2 Distri… 2018 4349 348. 652. 1957.
#> 7 COUNTRYX RegionA District2 Distri… 2019 3584 287. 538. 1613.
#> 8 COUNTRYX RegionA District2 Distri… 2020 4951 396. 743. 2228.
#> 9 COUNTRYX RegionA District2 Distri… 2021 5100 410 NA NA
#> 10 COUNTRYX RegionA District2 Distri… 2022 5228 422 NA NA
#> 11 COUNTRYX RegionB District1 Distri… 2018 2331 186. 350. 1049.
#> 12 COUNTRYX RegionB District1 Distri… 2019 2005 160. 301. 902.
#> 13 COUNTRYX RegionB District1 Distri… 2020 2447 196. 367. 1101.
#> 14 COUNTRYX RegionB District1 Distri… 2021 2520 203 NA NA
#> 15 COUNTRYX RegionB District1 Distri… 2022 2583 209 NA NA
#> 16 COUNTRYX RegionB District2 Distri… 2018 2112 169. 317. 950.
#> 17 COUNTRYX RegionB District2 Distri… 2019 4030 322. 604. 1814.
#> 18 COUNTRYX RegionB District2 Distri… 2020 4357 349. 654. 1961.
#> 19 COUNTRYX RegionB District2 Distri… 2021 4488 361 NA NA
#> 20 COUNTRYX RegionB District2 Distri… 2022 4600 372 NA NA
