
Calculate and report geo-naming match statistics
Source:R/harmonize_admin_names.R
calculate_match_stats.RdCompares entries in a dataset against a lookup across specified admin levels (e.g., country, province/state/region, district, subdistrict, settlement) and reports match statistics to the console.
Usage
calculate_match_stats(
data,
lookup_data,
level0 = NULL,
level1 = NULL,
level2 = NULL,
level3 = NULL,
level4 = NULL
)Arguments
- data
A data frame containing the target data to be matched.
- lookup_data
A data frame serving as the reference for matching.
- level0
Column name (country) present in both
dataandlookup_data.- level1
Column name (province/state/region) present in both datasets.
- level2
Column name (district) present in both datasets.
- level3
Column name (subdistrict) present in both datasets.
- level4
Column name (settlement) present in both datasets.
Details
Input columns supplied via
level*are normalized to lower case before matching.Base N for each side is the count of unique hierarchical names formed from the supplied levels (e.g.,
level0_level1_...). Per-level rows show matches out of each side's Base N.Console output includes:
a two-column summary (Target vs Lookup as base N),
side-level completeness messages (success/info/warning),
a per-level report of missing names (NA or empty strings) on either side. These missing names are not included in N only if the implementation drops them before counting (see Note below).
Note
If you want missing names (NA/empty) excluded from Base N, ensure the key-building step drops them before counting (see example patch below).
Examples
# minimal runnable example (toy data)
data <- data.frame(
country = c("Kenya", "Kenya", "Uganda"),
district = c("Nairobi", "Kisumu", "Kampala")
)
lookup <- data.frame(
country = c("Kenya", "Uganda"),
district = c("Nairobi", "Kampala")
)
calculate_match_stats(
data, lookup, level0 = "country", level2 = "district"
)
#>
#> ── ℹ Match Summary ─────────────────────────────────────────────────────────────
#>
#> ℹ Data has names not in lookup.
#>
#> Target data as base N
#> • country (level0): 2 out of 2 matched
#> • district (level2): 2 out of 3 matched
#> Lookup data as base N
#> • country (level0): 2 out of 2 matched
#> • district (level2): 2 out of 2 matched