
Calculate Malaria Incidence from Routine Health Facility Data (N0-N5)
Source:R/calc_incidence.R
calc_incidence.RdCalculates malaria incidence at the admin-month level (e.g., district-month) using a structured cascade framework (N0 through N5) to adjust for testing gaps, reporting incompleteness, and care-seeking behavior. Facility-level data is aggregated to admin level before calculating incidence. Returns a validated dataset with all incidence levels, quality flags, and source tracking.
Usage
calc_incidence(
data,
levels = c("N0", "N1", "N2", "N3"),
hf_var = "hf_uid",
adm0_var = NULL,
adm1_var = "adm1",
adm2_var = "adm2",
adm3_var = NULL,
date_var = "date",
year_var = "year",
pop_var = "pop",
conf_var = "conf",
test_var = "test",
pres_var = "pres",
tpr_var = "tpr",
reprate_var = "reprate",
cs_public_var = "cs_public",
cs_private_var = "cs_private",
cs_none_var = "cs_none",
cs_none_divisor = 2,
rate_multiplier = 1000,
scale_factor = lifecycle::deprecated(),
include_flags = FALSE,
suffix = NULL,
group_by = NULL,
return_facility = FALSE
)Arguments
- data
Routine health facility data at facility-month level (data.frame or tibble). Must contain one row per facility per month.
- levels
Character vector specifying which incidence levels to calculate (default:
c("N0", "N1", "N2", "N3")). Can specify subset likec("N0", "N1")or include N4/N5 withc("N0", "N1", "N2", "N3", "N4", "N5").- hf_var
Column name for health facility unique identifier (default: "hf_uid"). Set to
NULLwhen data is already aggregated at the admin level and has no facility identifier.- adm0_var
Column name for national/country level. If NULL (default), creates a single "country" value for all records.
- adm1_var
Column name for first administrative level/region (default: "adm1").
- adm2_var
Column name for second administrative level/district (default: "adm2").
- adm3_var
Column name for third administrative level/sub-district. If NULL (default), adm3 column is not included in output.
- date_var
Column name for date of reporting period (default: "date").
- year_var
Column name for year (default: "year"). Used for year-level aggregation in summary output.
- pop_var
Column name for population denominator (default: "pop"). Must be present in
data.- conf_var
Column name for number of confirmed malaria cases (default: "conf").
- test_var
Column name for number of individuals tested (default: "test").
- pres_var
Column name for number of presumed cases (default: "pres").
- tpr_var
Column name for test positivity rate (default: "tpr"). Must be present in
datafor N1/N2/N3 calculations.- reprate_var
Column name for reporting rate (default: "reprate"). Must be present in
datafor N2/N3 calculations.- cs_public_var
Column name for proportion seeking care at public facilities (default: "cs_public"). Required for N3.
- cs_private_var
Column name for proportion seeking care at private facilities (default: "cs_private"). Required for N3.
- cs_none_var
Column name for proportion seeking no care (default: "cs_none"). Required for N3/N4/N5.
- cs_none_divisor
Numeric. Divisor applied to cs_none for N5 calculation (default: 2). N5 uses
cs_none / cs_none_divisorinstead ofcs_none, producing a more conservative estimate than N4.- rate_multiplier
Denominator for incidence rate calculation (default: 1000, for cases per 1,000 population).
- scale_factor
Deprecated. Use
rate_multiplierinstead.- include_flags
Logical; if
TRUE, includes all quality flag columns in the output. IfFALSE(default), returns only the core incidence variables without flags.- suffix
Character string to append to incidence column names (default: NULL). If provided (e.g.,
suffix = "u5"), output columns will be namedn0_cases_u5,n0_incidence_u5, etc. Useful for distinguishing outputs for different population groups.- group_by
Character vector of additional column names to group by (e.g. facility type, urban/rural). These columns are preserved through aggregation and appear in output. Default NULL.
- return_facility
Logical; if
TRUE, includes facility-level data in the output with diagnostic flags. IfFALSE(default), returns only aggregated admin-level data. Useful for investigating data quality issues before aggregation (e.g., why n1_cases < n0_cases).
Value
A named list with components:
- monthly
A named list with tibbles at each admin level (N0-N5):
adm0: National level monthly incidence (N0-N5)adm1: First admin level (region) monthly incidence (N0-N5)adm2: Second admin level (district) monthly incidence (N0-N5)adm3: Third admin level monthly incidence (N0-N5, ifadm3_varprovided)
- annual
A named list with tibbles at each admin level (N0-N5):
adm0: National level annual incidence (N0-N5)adm1: First admin level (region) annual incidence (N0-N5)adm2: Second admin level (district) annual incidence (N0-N5)adm3: Third admin level annual incidence (N0-N5, ifadm3_varprovided)
- facility
(Only if
return_facility = TRUE) A named list with tibbles:monthly: Facility-month level dataannual: Facility-year level data (if applicable)
Monthly and annual tibbles contain:
ID columns:
adm0,adm1,adm2,adm3(depending on level)Time columns:
year,month,date(monthly) oryear(annual)pop: Population denominator (summed)conf: Confirmed cases (summed)test,pres,tpr: Testing data (summed, if N1+ calculated)reprate: Reporting rate (summed, if N2+ calculated)cs_public,cs_private,cs_none: Care-seeking proportions (if N3+ calculated)n0_cases,n0_incidence: N0 (crude)n1_cases,n1_incidence: N1 (testing-adjusted)n2_cases,n2_incidence: N2 (reporting-adjusted)n3_cases,n3_incidence: N3 (care-seeking-adjusted)n4_cases,n4_incidence: N4 (public + non-seekers)n5_cases,n5_incidence: N5 (conservative non-seekers)
Details
The incidence cascade framework:
N0 (Crude Incidence):
n0_cases = conf
n0_incidence = (n0_cases / pop) * rate_multiplier
N1 (Testing-Adjusted):
n1_cases = n0_cases + (pres * tpr)
n1_incidence = (n1_cases / pop) * rate_multiplier
N2 (Reporting-Adjusted):
n2_cases = n1_cases / reprate
n2_incidence = (n2_cases / pop) * rate_multiplier
N3 (Care-Seeking-Adjusted):
n3_cases = n2_cases + (n2_cases * cs_private/cs_public) + (n2_cases * cs_none/cs_public)
n3_incidence = (n3_cases / pop) * rate_multiplier
N4 (Public + Non-Seekers Adjusted):
n4_cases = n2_cases * (1 + cs_none/cs_public)
n4_incidence = (n4_cases / pop) * rate_multiplier
Note: Excludes private sector adjustment. Use when private data is already captured in DHIS2 or private burden is negligible.
N4 is always between N2 and N3 in magnitude (N2 < N4 < N3).
N5 (Conservative Non-Seekers Adjusted):
n5_cases = n2_cases * (1 + (cs_none/cs_none_divisor)/cs_public)
n5_incidence = (n5_cases / pop) * rate_multiplier
Note: Like N4 but divides the non-care-seeking proportion by
cs_none_divisor(default 2) for a more conservative estimate.N5 is always between N2 and N4 in magnitude (N2 < N5 < N4).
Each level builds on the previous, with N3 representing the most complete estimate of true community-level malaria incidence. N4 provides a conservative alternative to N3 that excludes private sector. N5 provides an even more conservative estimate by reducing the non-seeker adjustment.
Examples
# Example workflow: Calculate TPR first, then incidence
# library(tibble)
#
# # Step 1: Calculate TPR
# facility_data <- tibble(
# hf_uid = c("HF001", "HF002", "HF003"),
# adm1 = rep("RegionA", 3),
# adm2 = rep("DistrictX", 3),
# date = as.Date(c("2023-01-01", "2023-02-01", "2023-01-01")),
# conf = c(10, 15, 12),
# test = c(100, 120, 80),
# pres = c(5, 8, 10)
# )
#
# tpr_data <- calc_tpr(facility_data)
#
# # Step 2: Add population, reporting rate, and care-seeking data
# tpr_data$pop <- c(5000, 6000, 4500)
# tpr_data$reprate <- c(0.85, 0.90, 0.80)
# tpr_data$cs_public <- 0.60
# tpr_data$cs_private <- 0.25
# tpr_data$cs_none <- 0.15
#
# # Step 3: Calculate incidence (all levels N0-N3)
# result <- calc_incidence(tpr_data)
#
# # Access monthly facility-level data
# result$monthly$hf
#
# # Access monthly district-level data
# result$monthly$adm2
#
# # Access annual facility-level data
# result$annual$hf
#
# # Access annual national-level data
# result$annual$adm0