
List DHS variables and their haven labels by name pattern
Source:R/dhs_var_labels.R
list_dhs_var_labels.RdGeneric utility to inspect a (typically DHS) data frame and return a tidy
inventory of every column whose name matches a given prefix or regex,
together with its haven label attribute. This is useful when building
specs for custom_csb_indicator (or any indicator that routes columns by
label / variable name), since DHS recodes share many label conventions
across surveys but the exact set of populated columns varies by country
and round.
Usage
list_dhs_var_labels(
data,
pattern,
regex = FALSE,
only_observed = FALSE,
duplicate_label = TRUE
)Arguments
- data
A data frame (typically a DHS recode such as the KR file read by
dhs_read()). Columns may be plain vectors or haven-labelled vectors; only the name andlabelattribute are inspected.- pattern
Character scalar. Either a prefix (default) or a regex (when
regex = TRUE). Must be non-empty.- regex
Logical. If
FALSE(default),patternis treated as a prefix and the matched regex ispaste0("^", pattern, "[a-z0-9]+$"). IfTRUE,patternis used as the full regex.- only_observed
Logical. If
TRUE, drop variables with no non-NA values, or - for numeric 0/1 columns - no observed1. DefaultFALSE.- duplicate_label
Logical. If
TRUE(default), add aduplicate_labellogical column flagging variables whose label is shared with at least one other matched variable.
Value
A tibble with columns:
variable: matched column name.label: havenlabelattribute (orNA_character_if absent).n_nonmissing: number of non-NA values in the column.n_ones: number of values equal to1(orNAfor non-numeric columns).duplicate_label(only ifduplicate_label = TRUE): logical,TRUEif another matched variable has the same non-NA label.
Variables are returned in the order they appear in data. If no
variables match, an empty tibble with the same columns is returned
and a cli info message is emitted.
Details
Two matching modes are supported:
Prefix mode (default,
regex = FALSE): the helper matches columns whose names start withpatternfollowed by one or more lowercase letters or digits. For example,pattern = "h32"will matchh32a,h32x,h32a1, but noth32_recodedorh32.Regex mode (
regex = TRUE):patternis used directly as a regular expression. For example,pattern = "^ml13[a-z]$"will match ITN-source variablesml13a..ml13h.
Optionally, the helper can:
Mark variables whose haven label is shared with another variable in the same inventory (
duplicate_label = TRUE). This is the situation that motivates routing acustom_csb_indicatorspec by variable name rather than by label (e.g. h32e and h32n in some surveys both carry the label "Fever/cough: comm.health wrkr").Drop variables that have no observed
1(or no non-NA values at all for non-binary columns) viaonly_observed = TRUE. This keeps the inventory focused on slots actually populated by the survey.
The function never modifies data; it only reads column names and
the label attribute (as set by haven::read_dta() or
sntmethods::dhs_read()).
Examples
if (FALSE) { # \dontrun{
kr_data <- sntmethods::dhs_read(
path = path_dhs_parquet,
file_type = "KR",
survey_type = "DHS"
)
# All h32* treatment-seeking source columns
list_dhs_var_labels(kr_data, "h32")
# Only h32 columns with at least one observed `1`, flagging duplicates
list_dhs_var_labels(kr_data, "h32", only_observed = TRUE)
# ITN source columns ml13a..ml13h via regex
list_dhs_var_labels(kr_data, "^ml13[a-z]$", regex = TRUE)
} # }