Selects an appropriate binning method based on data distribution and returns bins with matching colors. Three methods are available: head-tail breaks for highly skewed data, hybrid (quantile + tail) for moderately skewed data, and pure quantile for roughly symmetric data.
Usage
auto_bin(
x,
palette = "default",
bin = 6,
decimals = 2,
round_to = 50,
reverse = FALSE,
labels = NULL,
outlier_threshold = NULL,
outlier_color = "#636363",
outlier_label = NULL
)Arguments
- x
Numeric vector to bin.
- palette
Character. Either a preset name or a custom character vector of hex colors. Use
list_palettes()to see available presets.- bin
Integer. Number of bins. Default is 6.
- decimals
Integer. Number of decimal places in labels. Default is 2.
- round_to
Numeric. Round break points to this increment. Default is 50. Set to NULL for raw values (useful for decimal data like rates).
- reverse
Logical. Reverse the color order? Default is FALSE.
- labels
Character vector. Optional custom bin labels. When provided, skips automatic binning and uses these labels with breaks parsed from label strings. Supports formats: "0–50", "50-100", ">1000". Example: c("0–50", "50–100", "100–250", "250–450", "450–700", "700–1000", ">1000")
- outlier_threshold
Numeric. Optional threshold to create a separate outlier bin for values above this threshold. Useful for metrics like TPR where values >1 are unusual. Default is NULL (no outlier handling).
- outlier_color
Character. Hex color for the outlier bin. Only used when outlier_threshold is specified. Default is "#636363" (dark grey).
- outlier_label
Character. Optional custom label for the outlier bin. when NULL (default), uses the auto-generated format.
Value
A list with:
- bins
Ordered factor of bin labels for each value in
x- colors
Named character vector mapping labels to colors
- counts
Data frame with bin labels and counts (n)
- method
Character. The binning method used: "headtail", "hybrid", "quantile", or "custom"
- diagnostics
List with prop_zero, skew_ratio, and tail_share
Details
Method selection logic:
headtail: prop_zero > 0.1, skew_ratio > 4, or tail_share > 0.4
hybrid: skew_ratio > 2
quantile: otherwise
custom: when labels parameter is provided
Use list_palettes() to see all available palette names.
Examples
# Simulated malaria incidence data
set.seed(42)
incidence <- c(rep(0, 20), rgamma(80, shape = 2, rate = 0.01))
result <- auto_bin(incidence)
table(result$bins)
#>
#> 0–50 50–100 100–200 200–250 250–350 >350
#> 35 14 14 14 15 8
result$method
#> [1] "headtail"
result$colors
#> 0–50 50–100 100–200 200–250 250–350 >350
#> "#DEEBF7" "#B6D4E9" "#75B3D8" "#FB8969" "#E94534" "#A50F15"
# Use named palette
auto_bin(incidence, palette = "byor")$colors
#> 0–50 50–100 100–200 200–250 250–350 >350
#> "#084594" "#3E89C2" "#B3D2EA" "#FDC96A" "#ED4728" "#800026"
# Use custom palette
auto_bin(incidence, palette = c("#ffffcc", "#a1dab4", "#41b6c4", "#225ea8"))
#> $bins
#> [1] 0–50 0–50 0–50 0–50 0–50 0–50 0–50 0–50 0–50
#> [10] 0–50 0–50 0–50 0–50 0–50 0–50 0–50 0–50 0–50
#> [19] 0–50 0–50 250–350 50–100 100–200 0–50 100–200 50–100 >350
#> [28] 100–200 0–50 0–50 250–350 250–350 250–350 200–250 250–350 0–50
#> [37] 50–100 >350 >350 200–250 200–250 50–100 0–50 0–50 200–250
#> [46] 250–350 250–350 50–100 50–100 50–100 200–250 50–100 100–200 50–100
#> [55] 100–200 0–50 200–250 0–50 200–250 100–200 200–250 50–100 >350
#> [64] 200–250 100–200 200–250 250–350 100–200 250–350 50–100 200–250 >350
#> [73] 50–100 100–200 >350 100–200 0–50 0–50 250–350 250–350 >350
#> [82] 0–50 250–350 200–250 0–50 200–250 100–200 0–50 100–200 0–50
#> [91] 100–200 0–50 250–350 >350 50–100 250–350 250–350 50–100 200–250
#> [100] 100–200
#> Levels: 0–50 < 50–100 < 100–200 < 200–250 < 250–350 < >350
#>
#> $colors
#> 0–50 50–100 100–200 200–250 250–350 >350
#> "#FFFFCC" "#C6E8BD" "#8DD2B7" "#54BDC0" "#3492B8" "#225EA8"
#>
#> $counts
#> bin n
#> 1 0–50 35
#> 2 50–100 14
#> 3 100–200 14
#> 4 200–250 14
#> 5 250–350 15
#> 6 >350 8
#>
#> $method
#> [1] "headtail"
#>
#> $diagnostics
#> $diagnostics$prop_zero
#> [1] 0.2
#>
#> $diagnostics$skew_ratio
#> [1] 2.111871
#>
#> $diagnostics$tail_share
#> [1] 0.2264725
#>
#>
# Reverse colors (high = light, low = dark)
auto_bin(incidence, reverse = TRUE)$colors
#> 0–50 50–100 100–200 200–250 250–350 >350
#> "#A50F15" "#E94534" "#FB8969" "#75B3D8" "#B6D4E9" "#DEEBF7"
# Use custom labels
custom_labels <- c("0–50", "50–100", "100–250", "250–450", "450–700", "700–1000", ">1000")
result <- auto_bin(incidence * 10, palette = "byor", labels = custom_labels)
table(result$bins)
#>
#> 0–50 50–100 100–250 250–450 450–700 700–1K >1K
#> 20 0 1 3 13 7 56
result$method # Returns "custom"
#> [1] "custom"
# Handle outliers above threshold (useful for TPR)
set.seed(123)
tpr <- c(runif(80, 0.5, 0.95), runif(20, 1.05, 1.3))
result <- auto_bin(tpr, palette = "rdbu", bin = 5, outlier_threshold = 1.0)
table(result$bins)
#>
#> 0.50–0.60 0.60–0.70 0.70–0.85 0.85–0.95 >1.00
#> 20 20 20 20 20
# Custom outlier label for data validation
result <- auto_bin(tpr, outlier_threshold = 1.0, outlier_label = "Suspect Values")
result$colors # Last bin (>1.00) will be grey
#> 0.50–0.60 0.60–0.70 0.70–0.75 0.75–0.85 0.85–0.95
#> "#DEEBF7" "#A8CEE4" "#B3A0A3" "#F35A40" "#A50F15"
#> Suspect Values
#> "#636363"
