Skip to contents

Calculates malaria incidence at the admin-month level (e.g., district-month) using a structured cascade framework (N0 through N5) to adjust for testing gaps, reporting incompleteness, and care-seeking behavior. Facility-level data is aggregated to admin level before calculating incidence. Returns a validated dataset with all incidence levels, quality flags, and source tracking.

Usage

calc_incidence(
  data,
  levels = c("N0", "N1", "N2", "N3"),
  hf_var = "hf_uid",
  adm0_var = NULL,
  adm1_var = "adm1",
  adm2_var = "adm2",
  adm3_var = NULL,
  date_var = "date",
  year_var = "year",
  pop_var = "pop",
  conf_var = "conf",
  test_var = "test",
  pres_var = "pres",
  tpr_var = "tpr",
  reprate_var = "reprate",
  cs_public_var = "cs_public",
  cs_private_var = "cs_private",
  cs_none_var = "cs_none",
  cs_none_divisor = 2,
  rate_multiplier = 1000,
  scale_factor = lifecycle::deprecated(),
  include_flags = FALSE,
  suffix = NULL,
  group_by = NULL,
  return_facility = FALSE
)

Arguments

data

Routine health facility data at facility-month level (data.frame or tibble). Must contain one row per facility per month.

levels

Character vector specifying which incidence levels to calculate (default: c("N0", "N1", "N2", "N3")). Can specify subset like c("N0", "N1") or include N4/N5 with c("N0", "N1", "N2", "N3", "N4", "N5").

hf_var

Column name for health facility unique identifier (default: "hf_uid"). Set to NULL when data is already aggregated at the admin level and has no facility identifier.

adm0_var

Column name for national/country level. If NULL (default), creates a single "country" value for all records.

adm1_var

Column name for first administrative level/region (default: "adm1").

adm2_var

Column name for second administrative level/district (default: "adm2").

adm3_var

Column name for third administrative level/sub-district. If NULL (default), adm3 column is not included in output.

date_var

Column name for date of reporting period (default: "date").

year_var

Column name for year (default: "year"). Used for year-level aggregation in summary output.

pop_var

Column name for population denominator (default: "pop"). Must be present in data.

conf_var

Column name for number of confirmed malaria cases (default: "conf").

test_var

Column name for number of individuals tested (default: "test").

pres_var

Column name for number of presumed cases (default: "pres").

tpr_var

Column name for test positivity rate (default: "tpr"). Must be present in data for N1/N2/N3 calculations.

reprate_var

Column name for reporting rate (default: "reprate"). Must be present in data for N2/N3 calculations.

cs_public_var

Column name for proportion seeking care at public facilities (default: "cs_public"). Required for N3.

cs_private_var

Column name for proportion seeking care at private facilities (default: "cs_private"). Required for N3.

cs_none_var

Column name for proportion seeking no care (default: "cs_none"). Required for N3/N4/N5.

cs_none_divisor

Numeric. Divisor applied to cs_none for N5 calculation (default: 2). N5 uses cs_none / cs_none_divisor instead of cs_none, producing a more conservative estimate than N4.

rate_multiplier

Denominator for incidence rate calculation (default: 1000, for cases per 1,000 population).

scale_factor

Deprecated. Use rate_multiplier instead.

include_flags

Logical; if TRUE, includes all quality flag columns in the output. If FALSE (default), returns only the core incidence variables without flags.

suffix

Character string to append to incidence column names (default: NULL). If provided (e.g., suffix = "u5"), output columns will be named n0_cases_u5, n0_incidence_u5, etc. Useful for distinguishing outputs for different population groups.

group_by

Character vector of additional column names to group by (e.g. facility type, urban/rural). These columns are preserved through aggregation and appear in output. Default NULL.

return_facility

Logical; if TRUE, includes facility-level data in the output with diagnostic flags. If FALSE (default), returns only aggregated admin-level data. Useful for investigating data quality issues before aggregation (e.g., why n1_cases < n0_cases).

Value

A named list with components:

monthly

A named list with tibbles at each admin level (N0-N5):

  • adm0: National level monthly incidence (N0-N5)

  • adm1: First admin level (region) monthly incidence (N0-N5)

  • adm2: Second admin level (district) monthly incidence (N0-N5)

  • adm3: Third admin level monthly incidence (N0-N5, if adm3_var provided)

annual

A named list with tibbles at each admin level (N0-N5):

  • adm0: National level annual incidence (N0-N5)

  • adm1: First admin level (region) annual incidence (N0-N5)

  • adm2: Second admin level (district) annual incidence (N0-N5)

  • adm3: Third admin level annual incidence (N0-N5, if adm3_var provided)

facility

(Only if return_facility = TRUE) A named list with tibbles:

  • monthly: Facility-month level data

  • annual: Facility-year level data (if applicable)

Monthly and annual tibbles contain:

  • ID columns: adm0, adm1, adm2, adm3 (depending on level)

  • Time columns: year, month, date (monthly) or year (annual)

  • pop: Population denominator (summed)

  • conf: Confirmed cases (summed)

  • test, pres, tpr: Testing data (summed, if N1+ calculated)

  • reprate: Reporting rate (summed, if N2+ calculated)

  • cs_public, cs_private, cs_none: Care-seeking proportions (if N3+ calculated)

  • n0_cases, n0_incidence: N0 (crude)

  • n1_cases, n1_incidence: N1 (testing-adjusted)

  • n2_cases, n2_incidence: N2 (reporting-adjusted)

  • n3_cases, n3_incidence: N3 (care-seeking-adjusted)

  • n4_cases, n4_incidence: N4 (public + non-seekers)

  • n5_cases, n5_incidence: N5 (conservative non-seekers)

Details

The incidence cascade framework:

  • N0 (Crude Incidence):

    • n0_cases = conf

    • n0_incidence = (n0_cases / pop) * rate_multiplier

  • N1 (Testing-Adjusted):

    • n1_cases = n0_cases + (pres * tpr)

    • n1_incidence = (n1_cases / pop) * rate_multiplier

  • N2 (Reporting-Adjusted):

    • n2_cases = n1_cases / reprate

    • n2_incidence = (n2_cases / pop) * rate_multiplier

  • N3 (Care-Seeking-Adjusted):

    • n3_cases = n2_cases + (n2_cases * cs_private/cs_public) + (n2_cases * cs_none/cs_public)

    • n3_incidence = (n3_cases / pop) * rate_multiplier

  • N4 (Public + Non-Seekers Adjusted):

    • n4_cases = n2_cases * (1 + cs_none/cs_public)

    • n4_incidence = (n4_cases / pop) * rate_multiplier

    • Note: Excludes private sector adjustment. Use when private data is already captured in DHIS2 or private burden is negligible.

    • N4 is always between N2 and N3 in magnitude (N2 < N4 < N3).

  • N5 (Conservative Non-Seekers Adjusted):

    • n5_cases = n2_cases * (1 + (cs_none/cs_none_divisor)/cs_public)

    • n5_incidence = (n5_cases / pop) * rate_multiplier

    • Note: Like N4 but divides the non-care-seeking proportion by cs_none_divisor (default 2) for a more conservative estimate.

    • N5 is always between N2 and N4 in magnitude (N2 < N5 < N4).

Each level builds on the previous, with N3 representing the most complete estimate of true community-level malaria incidence. N4 provides a conservative alternative to N3 that excludes private sector. N5 provides an even more conservative estimate by reducing the non-seeker adjustment.

Examples

# Example workflow: Calculate TPR first, then incidence
# library(tibble)
#
# # Step 1: Calculate TPR
# facility_data <- tibble(
#   hf_uid = c("HF001", "HF002", "HF003"),
#   adm1 = rep("RegionA", 3),
#   adm2 = rep("DistrictX", 3),
#   date = as.Date(c("2023-01-01", "2023-02-01", "2023-01-01")),
#   conf = c(10, 15, 12),
#   test = c(100, 120, 80),
#   pres = c(5, 8, 10)
# )
#
# tpr_data <- calc_tpr(facility_data)
#
# # Step 2: Add population, reporting rate, and care-seeking data
# tpr_data$pop <- c(5000, 6000, 4500)
# tpr_data$reprate <- c(0.85, 0.90, 0.80)
# tpr_data$cs_public <- 0.60
# tpr_data$cs_private <- 0.25
# tpr_data$cs_none <- 0.15
#
# # Step 3: Calculate incidence (all levels N0-N3)
# result <- calc_incidence(tpr_data)
#
# # Access monthly facility-level data
# result$monthly$hf
#
# # Access monthly district-level data
# result$monthly$adm2
#
# # Access annual facility-level data
# result$annual$hf
#
# # Access annual national-level data
# result$annual$adm0