
Impute outliers using moving average from adjacent time points
Source:R/outlier_detection.R
impute_outlier_ma.RdReplaces outlier values (those exceeding a threshold) with the moving average of adjacent non-outlier values. Useful for smoothing time series data where outliers should be replaced with locally consistent values rather than removed.
Value
Numeric vector of the same length as x, with outliers replaced by
the rounded moving average of valid adjacent values. If no valid adjacent
values exist, the original value is retained.
Details
The function works as follows:
For each value, it checks if the previous and next values are non-outliers (i.e., <= threshold)
If a value exceeds the threshold, it calculates the mean of valid adjacent values (previous and/or next)
The outlier is replaced with the rounded moving average
If no valid adjacent values exist (both neighbors are also outliers or NA), the original value is kept
This approach preserves local trends while removing spike anomalies. It's particularly useful for time series where outliers represent data errors rather than true signal.
Examples
# simple example with one outlier
x <- c(10, 12, 100, 14, 11)
impute_outlier_ma(x, threshold = 50)
#> [1] 10 12 13 14 11
# Returns: c(10, 12, 13, 14, 11) - the 100 is replaced with mean(12, 14) = 13
# consecutive outliers - only outer edges get imputed
x <- c(10, 100, 200, 15)
impute_outlier_ma(x, threshold = 50)
#> [1] 10 10 15 15
# Returns: c(10, 10, 15, 15) - each outlier uses its one valid neighbor
# use with dplyr for grouped time series
if (FALSE) { # \dontrun{
data |>
dplyr::arrange(adm2, date) |>
dplyr::group_by(adm2) |>
dplyr::mutate(
cases_imputed = impute_outlier_ma(cases, threshold = upper_bound)
) |>
dplyr::ungroup()
} # }