
Spatial Vector Validation and Cleaning
Source:R/validate_process_spatial.R
validate_process_spatial.RdValidates spatial vector data and creates admin-level aggregations with geometry hashes. Follows the same validation approach as validate_process_coordinates() with modular checks and standardized outputs.
Usage
validate_process_spatial(
shp,
name = "shapefile",
adm0_col = NULL,
adm1_col = NULL,
adm2_col = NULL,
adm3_col = NULL,
adm4_col = NULL,
fix_issues = TRUE,
quiet = FALSE,
geometry_crs = 4326,
drop_z = TRUE
)Arguments
- shp
An sf object to validate
- name
Character string identifying the dataset (for reporting)
- adm0_col
Character string, column name for country level
- adm1_col
Character string, column name for region/province level
- adm2_col
Character string, column name for district level
- adm3_col
Character string, column name for chiefdom/commune level
- adm4_col
Character string, column name for sub-chiefdom/village level
- fix_issues
Logical, whether to attempt automatic fixes
- quiet
Logical, whether to suppress progress messages
- geometry_crs
Target CRS for output geometry (default EPSG:4326). After CRS validation/fixes, geometries are transformed to this CRS for subsequent processing and output.
- drop_z
Logical, whether to drop Z/M coordinates before validation. Defaults to TRUE to avoid s2 warnings for 2D processing.
Value
A list with validation results. Key elements:
issues: Character vector of issues detected.invalid_rows: sf/data.frame of rows failing basic geometry checks.checks$duplicate_rows: Duplicates by admin names + geometry.checks$all_shared_locations: All records sharing geometry.checks$all_shared_locations_ungrouped: Only the duplicate rows at shared geometry.checks$invalid_geometries: Rows with invalid geometries before fixing.checks$empty_geometries: Rows with empty geometries.checks$self_intersecting: Rows with self-intersecting geometries.column_dictionary: Data frame with column_name and description.final_spat_vec: List of admin-level aggregations (adm0, adm1, adm2, adm3, adm4).geometry_types: Tibble summarizing geometry types and counts.spatial_extent: Named vector with xmin, ymin, xmax, ymax.
Examples
if (FALSE) { # \dontrun{
# Basic validation with Sierra Leone data (adm3)
result <- validate_process_spatial(
shp = shp_raw,
name = "Sierra Leone boundaries",
adm0_col = "country",
adm1_col = "region",
adm2_col = "district",
adm3_col = "chiefdom"
)
# Validation with adm4
result <- validate_process_spatial(
shp = shp_raw,
name = "Sierra Leone boundaries",
adm0_col = "country",
adm1_col = "region",
adm2_col = "district",
adm3_col = "chiefdom",
adm4_col = "village"
)
# Access cleaned admin levels
shp_adm0 <- result$final_spat_vec$adm0
shp_adm1 <- result$final_spat_vec$adm1
shp_adm2 <- result$final_spat_vec$adm2
shp_adm3 <- result$final_spat_vec$adm3
shp_adm4 <- result$final_spat_vec$adm4
# Check validation status
if (length(result$issues) == 0) {
cli::cli_alert_success("Ready for SNT analysis!")
}
# Access invalid rows (if any)
invalid_rows <- result$invalid_rows
# Access duplicate rows (if any)
duplicate_rows <- result$checks$duplicate_rows
shared_locations <- result$checks$all_shared_locations
# Check column structure as data frame
col_dict <- result$column_dictionary
# Shows data frame with shapefile, old_name, new_name, description
print(col_dict)
} # }