Corrects outlier values by replacing them with the median of neighboring months (month before and month after). Applies a consistency rule to ensure corrected values do not exceed an upper bound variable.
Usage
correct_outliers(
data,
target_vars,
consistency_vars,
outlier_flag_col = "outlier_flag_consensus",
group_cols,
date_col = "date",
verbose = TRUE
)Arguments
- data
Data frame containing outlier flags and variables to correct.
- target_vars
Character vector. Columns to correct (e.g., c("conf", "maltreat")). Each target is paired with the corresponding element in
consistency_vars.- consistency_vars
Character vector. Upper bound columns (e.g., c("test", "conf")). Must be the same length as
target_vars. For each pair, the corrected value must be <= consistency variable.- outlier_flag_col
Character. Column containing outlier flags. Values of "outlier" are corrected. Default: "outlier_flag_consensus".
- group_cols
Character vector. Columns defining the time series groups (e.g., c("hf_uid") for facility-level). Neighbors are found within groups.
- date_col
Character. Date column name. Default: "date".
- verbose
Logical. Print summary messages. Default: TRUE.
Value
The input data frame with two new columns per target variable:
{target}_correctedThe corrected value, or original if no correction was applied.
{target}_correction_flagReason for correction status: "corrected", "failed_consistency", "missing_neighbors", or "not_outlier".
Details
For each target variable flagged as an outlier:
Retrieve the value from the month before (lag) and month after (lead)
If either neighbor is missing, no correction is applied
Compute the median of the two neighbors
Check if the median exceeds the consistency variable for that row
If the consistency check fails, the corrected value is set to NA
Otherwise, replace the outlier with the median
Examples
if (FALSE) { # \dontrun{
# After running detect_outliers()
result <- detect_outliers(data, column = "conf", ...)
# Correct single variable: conf bounded by test
corrected <- correct_outliers(
data = result,
target_vars = "conf",
consistency_vars = "test",
group_cols = c("hf_uid"),
date_col = "date"
)
# Correct multiple variables at once
corrected <- correct_outliers(
data = result,
target_vars = c("conf", "maltreat", "maladm"),
consistency_vars = c("test", "conf", "conf"),
group_cols = c("hf_uid"),
date_col = "date"
)
} # }
