Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 3. Stratification
  2. 3.1 Epidemiological Stratification
  3. Incidence adjustment 2: incomplete reporting
  • Code library for subnational tailoring
    English version
  • 1. Getting Started
    • 1.1 About and Contact Information
    • 1.2 For Everyone
    • 1.3 For the SNT Team
    • 1.4 For Analysts
    • 1.5 Producing High-Quality Outputs
  • 2. Data Assembly and Management
    • 2.1 Working with Shapefiles
      • Spatial data overview
      • Basic shapefile use and visualization
      • Shapefile management and customization
      • Merging shapefiles with tabular data
    • 2.2 Health Facilities Data
      • Fuzzy matching of names across datasets
      • Health facility coordinates and point data
    • 2.3 Routine Surveillance Data
      • Routine data extraction
      • DHIS2 data preprocessing
      • Determining active and inactive status
      • Contextual considerations
      • Missing data detection methods
      • Health facility reporting rate
      • Data coherency checks
      • Outlier detection methods
      • Imputation methods
      • Final database
    • 2.4 Stock Data
      • LMIS
    • 2.5 Population Data
      • National population data
      • WorldPop population raster
    • 2.6 National Household Survey Data
      • DHS data overview and preparation
      • Prevalence of malaria infection
      • All-cause child mortality
      • Treatment-seeking rates
      • ITN ownership, access, and usage
      • Wealth quintiles analysis
    • 2.7 Entomological Data
      • Entomological data
    • 2.8 Climate and Environmental Data
      • Climate and environment data extraction from raster
    • 2.9 Modeled Data
      • Generating spatial modeled estimates
      • Working with geospatial model estimates
      • Modeled estimates of malaria mortality and proxies
      • Modeled estimates of entomological indicators
  • 3. Stratification
    • 3.1 Epidemiological Stratification
      • Incidence overview and crude incidence
      • Incidence adjustment 1: incomplete testing
      • Incidence adjustment 2: incomplete reporting
      • Incidence adjustment 3: treatment-seeking
      • Incidence stratification
      • Prevalence and mortality stratification
      • Combined risk categorization
      • Risk categorization REMOVE?
      • Risk categorization REMOVE?
    • 3.2 Stratification of Determinants of Malaria Transmission
      • Seasonality
      • Access to Care
  • 4. Review of Past Interventions
    • 4.1 Case Management
    • 4.2 Routine Interventions
    • 4.3 Campaign Interventions
    • 4.4 Other Interventions
  • 5. Targeting of Interventions
  • 6. Retrospective Analysis
    • 6.1: Trend analysis
  • 7. Urban Microstratification

On this page

  • Overview
  • Step-by-Step Instructions
    • Step 1: Load required packages and files
    • Step 3: Calculate monthly adjusted incidence (N2) cases
      • Step 3.1: Calculate annual adjusted incidence 2
    • Step 4: Save Files
  • Summary
  • Full code
  1. 3. Stratification
  2. 3.1 Epidemiological Stratification
  3. Incidence adjustment 2: incomplete reporting

Incidence adjustment 2: incomplete reporting

Overview

Second adjustment: A second adjustment is made to account for the varying reporting rates (RRs) per area-time by inflating the number of corrected confirmed cases by the fraction of the expected records not received (N2). Through this step, it is assumed that the non-reported data follows a similar distribution to the data reported. Reporting rates can be calculated per health facility type to avoid an over- or underestimate of the effect of missing data observed in smaller or larger health facilities, respectively. An alternative approach to this adjustment is the imputation of data for the months of missing values per health facility. This can be computationally intensive and requires a relatively complete database to appropriately inform imputations, but it would provide a complete database for which a reporting rate adjustment would not be necessary. The equation for second incidence adjustment is given by: N2= N1/d

Where

  • N2 are the corrected number of cases for testing and reporting rates;
  • d are the reporting rates (records received / records expected), which can be weighted per the type of HF that did not report in a given point in time
Objectives
  • TBD

Step-by-Step Instructions

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Load required packages and files

Step 1.1: Load packages

The first step is to install and load the libraries required for this section.

  • R
  • R
  • Python
# Install pacman only if it's not already installed
if (!requireNamespace("pacman", quietly = TRUE)) {
  install.packages("pacman")
}

# install or load relevant packages
pacman::p_load(
  readxl,    # import and read Excel files
  ggplot2,   # plotting
  rio,       # for importing and exporting files
  gridExtra, # plot arrangements
  here,    # shows path to file
  stringr,    # clean up names,
  xts,       # return first or last element of a vector
  tidyverse,  # contains functions for data manipulations
  sf,          # spatial features for use in mapping
  scales      # calculates "pretty" breaks
)
Step 1.2: Load files

We bring in the monthly incidence data we saved in the adjusted1 incidence page

  • R
  • Python

Step 2: Join incidence data with Reporting rate data

We start with joining the file created under reporting rate to the incidence file we have been working with. Reporting rates are usually summarized at nearest operational admin level above health facilities by month-year. Here we use adm3 for illustration but countries can adapt to their setting.

Note: it is highly recommended that first and second adjusted incidence cases are calculated by month.

  • Step 2.1: Join the datasets

R

# Join incidence data with reporting rate data
inc_rep_rate <- inc_data %>%
  left_join(mon_rep_rate, ., by = c("adm3"),
            relationship = "one-to-one")

Python

Step 2.1: Check RR>1 and resolve
# Examine values of reporting rate data

ggplot(inc_rep_rate, aes(x = factor(adm3), y = rep_rate)) +
  geom_boxplot() +
  labs(title = "Distribution of Reporting Rate by Admin3",
       x = "Admin3",
       y = "Reporting Rate") +
  theme_minimal()


# ensuring no reporting rate is above 1 or > 100%
inc_rep_rate <- inc_rep_rate %>%
          mutate(reprate = ifelse(reprate > 1, 1, reprate))

Step 3: Calculate monthly adjusted incidence (N2) cases

This involves adjusting for reporting rates

Next we calculate for adjusted cases by accounting for reporting rates at adm3 level. We account for reporting rates by multiplying the adjusted 1 cases by the proportion of non-reporting i.e 1-reporting rate. The result is the additional number of cases should all facilities have reported

  • R
  • Python
inc_data <- inc_rep_rate |>
  mutate(adjcases2 = adjcases1 + (adjcases1*(1-reprate)),

 # deriving adjusted incidence 2 by dividing by the population parameter
         adjinc2 = adjcases2/pop)

Step 3.1: Calculate annual adjusted incidence 2

For the purposes of SNT annual incidence estimates are more useful to compare between years and admin levels

  • R
  • Python
## Aggregate the dataset by year
adj2_inc_ann <- inc_data |>
  dplyr::group_by(adm1, adm2, adm3, year) |>
  dplyr::summarise(
                   across(c(susp:conf_tpr, adjcases1, adjinc1, adjcases2, adjinc2), sum, na.rm=TRUE),
                   across(c(pop, test_rate, tpr, rep_rate), mean, na.rm = TRUE)
                   ) |>
   ungroup()


# calculate annual crude incidence
adj2_inc_ann <- adj2_inc_ann   |>
  dplyr::mutate(
    ann_crude = crudeinc * 1000,
    ann_adjinc1 = adjinc1 * 1000,
    ann_adjinc2 = adjin2 * 1000)

# visualize the first observations of the data set
head(adj2_inc_ann)

Step 4: Save Files

Now we save our incidence data as a csv file

  • R
  • Python
## Save Monthly Incidence Data Set

rio::export(
  inc_data,
  file = "english/data_r/incidence_adj/monthly_inc_data.csv",
  format = "csv"
)

# Save annual incidence data set
rio::export(
  adj2_inc_ann,
  file = "english/data_r/incidence_adj/annual_inc_data.csv"
)

Summary

TBD

Full code

  • R
  • Python
Show full code
#===============================================================================
# End of Script
#===============================================================================
 

©2025 Applied Health Analytics for Delivery and Innovation. All rights reserved