# Install pacman only if it's not already installed
if (!requireNamespace("pacman", quietly = TRUE)) {
install.packages("pacman")
}
# install or load relevant packages
pacman::p_load(
readxl, # import and read Excel files
ggplot2, # plotting
rio, # for importing and exporting files
gridExtra, # plot arrangements
here, # shows path to file
stringr, # clean up names,
xts, # return first or last element of a vector
tidyverse, # contains functions for data manipulations
sf, # spatial features for use in mapping
scales # calculates "pretty" breaks
)Incidence adjustment 2: incomplete reporting
Overview
Second adjustment: A second adjustment is made to account for the varying reporting rates (RRs) per area-time by inflating the number of corrected confirmed cases by the fraction of the expected records not received (N2). Through this step, it is assumed that the non-reported data follows a similar distribution to the data reported. Reporting rates can be calculated per health facility type to avoid an over- or underestimate of the effect of missing data observed in smaller or larger health facilities, respectively. An alternative approach to this adjustment is the imputation of data for the months of missing values per health facility. This can be computationally intensive and requires a relatively complete database to appropriately inform imputations, but it would provide a complete database for which a reporting rate adjustment would not be necessary. The equation for second incidence adjustment is given by: N2= N1/d
Where
- N2 are the corrected number of cases for testing and reporting rates;
- d are the reporting rates (records received / records expected), which can be weighted per the type of HF that did not report in a given point in time
- TBD
Step-by-Step Instructions
To skip the step-by-step explanation, jump to the full code at the end of this page.
Step 1: Load required packages and files
Step 1.1: Load packages
The first step is to install and load the libraries required for this section.
Step 1.2: Load files
We bring in the monthly incidence data we saved in the adjusted1 incidence page
Step 2: Join incidence data with Reporting rate data
We start with joining the file created under reporting rate to the incidence file we have been working with. Reporting rates are usually summarized at nearest operational admin level above health facilities by month-year. Here we use adm3 for illustration but countries can adapt to their setting.
Note: it is highly recommended that first and second adjusted incidence cases are calculated by month.
Step 2.1: Check RR>1 and resolve
# Examine values of reporting rate data
ggplot(inc_rep_rate, aes(x = factor(adm3), y = rep_rate)) +
geom_boxplot() +
labs(title = "Distribution of Reporting Rate by Admin3",
x = "Admin3",
y = "Reporting Rate") +
theme_minimal()
# ensuring no reporting rate is above 1 or > 100%
inc_rep_rate <- inc_rep_rate %>%
mutate(reprate = ifelse(reprate > 1, 1, reprate))Step 3: Calculate monthly adjusted incidence (N2) cases
This involves adjusting for reporting rates
Next we calculate for adjusted cases by accounting for reporting rates at adm3 level. We account for reporting rates by multiplying the adjusted 1 cases by the proportion of non-reporting i.e 1-reporting rate. The result is the additional number of cases should all facilities have reported
Step 3.1: Calculate annual adjusted incidence 2
For the purposes of SNT annual incidence estimates are more useful to compare between years and admin levels
## Aggregate the dataset by year
adj2_inc_ann <- inc_data |>
dplyr::group_by(adm1, adm2, adm3, year) |>
dplyr::summarise(
across(c(susp:conf_tpr, adjcases1, adjinc1, adjcases2, adjinc2), sum, na.rm=TRUE),
across(c(pop, test_rate, tpr, rep_rate), mean, na.rm = TRUE)
) |>
ungroup()
# calculate annual crude incidence
adj2_inc_ann <- adj2_inc_ann |>
dplyr::mutate(
ann_crude = crudeinc * 1000,
ann_adjinc1 = adjinc1 * 1000,
ann_adjinc2 = adjin2 * 1000)
# visualize the first observations of the data set
head(adj2_inc_ann)Step 4: Save Files
Now we save our incidence data as a csv file
Summary
TBD