
Coordinate Validation and Cleaning
Source:R/validate_process_coordinates.R
validate_process_coordinates.RdValidates coordinate data (points) and returns a cleaned sf object with standardized columns. Accepts either an sf object with POINT geometry or a data.frame with longitude and latitude columns. Checks include missing coordinates, DMS detection with optional conversion (via the suggested 'parzer' package), range, precision, flip detection, and country containment.
Usage
validate_process_coordinates(
data,
name = "points",
lon_col = NULL,
lat_col = NULL,
adm0_sf = NULL,
geometry_crs = 4326,
min_decimals = 4,
id_col = NULL,
fix_issues = TRUE,
quiet = FALSE
)Arguments
- data
An
sfobject with POINT geometry or adata.framecontaining coordinate columns.- name
Character string identifying the dataset (for reporting).
- lon_col
Character string, column name for longitude (if
datais not sf).- lat_col
Character string, column name for latitude (if
datais not sf).- adm0_sf
Optional
sfpolygon/multipolygon of the country to validate points against (EPSG:4326 preferred). When provided, the function tests that all points fall within the given shape. No admin attributes are joined to the points.- geometry_crs
Target CRS for output geometry and for operations with
adm0_sf(default EPSG:4326). Lat/Lon attributes remain in EPSG:4326.- min_decimals
Minimum decimal places required for lon/lat to be considered precise (default 4).
- id_col
Optional character name of a unique identifier column (e.g., facility ID). When provided, duplicate reporting includes IDs for coordinate-duplicate rows to aid de-dup decisions.
- fix_issues
Logical, whether to attempt automatic fixes (drop invalids, set CRS, standardize columns, remove duplicates, attempt flip fixes when
adm0_sfis available).- quiet
Logical, whether to suppress progress messages.
Value
A list with validation results. Key elements:
issues: Character vector of issues detected.final_points_df: Cleaned sf object of points.invalid_rows: sf/data.frame of rows failing basic coordinate checks.checks$duplicate_rows: Duplicates by ID + coordinates (standardized sf).checks$all_shared_locations: All records sharing a location.checks$all_shared_locations_ungrouped: Only the duplicate rows at shared locations.checks$coordinates_adm0: Points outside ADM0 with original lon/lat restored from geometry.column_dictionary: Data frame mapping old to standardized column names.
Examples
if (FALSE) { # \dontrun{
# Example data with common issues: missing, DMS, flipped, imprecise
df <- data.frame(
hf = paste0("HF_", 1:6),
lon = c(-13.26077, "13\u00B015'38\"W", 8.5, -12.09057, NA, 181),
lat = c(8.464283, "8\u00B027'51\"N", -13.26, 8.770261, 8.1, 0.1)
)
# Simple rectangle as country polygon (e.g., Sierra Leone-ish bbox)
adm0_sf <- sf::st_as_sfc(
sf::st_bbox(c(xmin = -14, ymin = 6.9, xmax = -10, ymax = 10.1),
crs = sf::st_crs(4326))
) |>
sf::st_sf()
# Validate and auto-fix issues (DMS parsing requires 'parzer')
res <- validate_process_coordinates(
data = df,
name = "hf_points",
lon_col = "lon",
lat_col = "lat",
adm0_sf = adm0_sf,
min_decimals = 4,
fix_issues = TRUE,
quiet = TRUE
)
# Cleaned points
res$final_points_df
} # }