Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 2. Data Assembly and Management
  2. 2.3 Routine Surveillance Data
  3. Health facility reporting rate
  • Code library for subnational tailoring
    English version
  • 1. Getting Started
    • 1.1 About and Contact Information
    • 1.2 For Everyone
    • 1.3 For the SNT Team
    • 1.4 For Analysts
    • 1.5 Producing High-Quality Outputs
  • 2. Data Assembly and Management
    • 2.1 Working with Shapefiles
      • Spatial data overview
      • Basic shapefile use and visualization
      • Shapefile management and customization
      • Merging shapefiles with tabular data
    • 2.2 Health Facilities Data
      • Fuzzy matching of names across datasets
      • Health facility coordinates and point data
    • 2.3 Routine Surveillance Data
      • Routine data extraction
      • DHIS2 data preprocessing
      • Determining active and inactive status
      • Contextual considerations
      • Missing data detection methods
      • Health facility reporting rate
      • Data coherency checks
      • Outlier detection methods
      • Imputation methods
      • Final database
    • 2.4 Stock Data
      • LMIS
    • 2.5 Population Data
      • National population data
      • WorldPop population raster
    • 2.6 National Household Survey Data
      • DHS data overview and preparation
      • Prevalence of malaria infection
      • All-cause child mortality
      • Treatment-seeking rates
      • ITN ownership, access, and usage
      • Wealth quintiles analysis
    • 2.7 Entomological Data
      • Entomological data
    • 2.8 Climate and Environmental Data
      • Climate and environment data extraction from raster
    • 2.9 Modeled Data
      • Generating spatial modeled estimates
      • Working with geospatial model estimates
      • Modeled estimates of malaria mortality and proxies
      • Modeled estimates of entomological indicators
  • 3. Stratification
    • 3.1 Epidemiological Stratification
      • Incidence overview and crude incidence
      • Incidence adjustment 1: incomplete testing
      • Incidence adjustment 2: incomplete reporting
      • Incidence adjustment 3: treatment-seeking
      • Incidence stratification
      • Prevalence and mortality stratification
      • Combined risk categorization
      • Risk categorization REMOVE?
      • Risk categorization REMOVE?
    • 3.2 Stratification of Determinants of Malaria Transmission
      • Seasonality
      • Access to Care
  • 4. Review of Past Interventions
    • 4.1 Case Management
    • 4.2 Routine Interventions
    • 4.3 Campaign Interventions
    • 4.4 Other Interventions
  • 5. Targeting of Interventions
  • 6. Retrospective Analysis
    • 6.1: Trend analysis
  • 7. Urban Microstratification

On this page

  • Overview
  • Defining and Calculating Reporting Rate for Routine Indicators
    • What is reporting rate?
    • Establishing the denominator: Which facilities are expected to report?
    • Calculating reporting rates
      • Worked example
    • Weighted reporting rates
      • Calculating weighted reporting rates
      • How do I know whether to use unweighted or weighted reporting rate?
  • Step-by-Step
    • Step 1: Import relevant packages and pre-processed routine data
    • Step 2: Import the number of expected reports per reporting period
    • Step 3: Define function to calculate reporting rate
    • Step 4: Calculate reporting rate for a given indicator, at a given admin unit level
      • Step 4.1: Unweighted reporting rate
      • Step 4.2: Weighted reporting rate
      • Step 4.3: Compare unweighted and weighted reporting rates
    • Step 5: Visualise mulitple indicator-specific reporting rates at the national level
      • Step 5.1 Multiple indicator heatmap at the national level
      • Step 5.2 Multiple indicator line plot at the national level
    • Step 6: Visualise single indicator reporting rates by admin unit
      • Step 6.1 Single indicator heatmap by admin unit
      • Step 5.2: Single indicator line plot by admin unit
  • Summary
  • Full code
  1. 2. Data Assembly and Management
  2. 2.3 Routine Surveillance Data
  3. Health facility reporting rate

Health facility reporting rate

Overview

In the SNT workflow, routine surveillance data are often used to calculate key indicators such as malaria incidence, test positivity rate, and confirmed case counts. To interpret and use this data properly, we need to first assess their quality and completeness. One important assessment is of what proportion of health facilities are reporting consistently and completely across geography and time.

Reporting rates provide a simple and essential metric to evaluate the completeness of routine data. They help identify where gaps in reporting may affect the reliability of indicators used in decision-making and where routine surveillance should be strengthened. Reporting rates can also be used to estimate the number of additional cases (or other count indicator) that would have also been included in routine surveillance, if all facilities had reported.

This section outlines how to calculate, inspect, and save reporting rates using a reproducible approach, while grounding all choices in dialogue with the national malaria program and SNT team.

Objectives
  • Understand how to calculate reporting rates from routine health facility data
  • Visualize and interpret reporting rate patterns at any admin unit level
  • Compile and save validated reporting rate outputs for use in later SNT workflow steps

Defining and Calculating Reporting Rate for Routine Indicators

To ensure routine data can be used reliably in the SNT workflow, we need a clear and consistent method for calculating reporting rates across facilities and time. This section introduces a process to calculate monthly reporting rates for any routinely reported indicator.

What is reporting rate?

Reporting rate is the proportion of entities, such as health facilities or community health workers, in a given admin unit that reported on an indicator during a time period of interest.

In the example on this page, we are using monthly data from DHIS2, so we calculate monthly reporting rates. However, you should calculate reporting rate for the relevant reporting period in your dataset. For example, if you are analyzing weekly surveillance data, your reporting rate should be calculated on a weekly basis.

When presenting reporting rates, it is important to specify what indicator(s) are used to define the entity as reporting. In SNT, it is best practice to re-calculate reporting rate for each indicator of interest, as reporting practice may vary across indicators within the same facility. For example, a facility may prioritize reporting confirmed cases, as their stock replenishment depends on showing consistent reporting, but neglect to report all-cause outpatient visits.

An overall reporting rate can also be calculated for each entity in a given time period. In this case, a facility may only be defined as reporting if it reports suspected cases, tested cases, confirmed cases, and treated cases. This aggregated reporting rate should be at most the minimum of the reporting rate of each individual indicator.

Establishing the denominator: Which facilities are expected to report?

Before evaluating the proportion of facilites that reported in a given reporting period, we need to first determine the number of facilities that should report. To avoid underestimating the reporting rate and gaining and inaccurate assessment of the quality of surveillance, the SNT team may, for example, consider:

  • When calculating reporting rate for confirmed malaria cases, exclude facility types that do not test or treat malaria. For example, HIV clinics or maternity wards.

  • When calculating reporting rate for malaria admissions, exclude facility types that only handle outpatients, for example community health workers or health posts.

  • When calculating reporting rate for an indicator, exclude facilities that have closed, that are not yet active, or are temporarily nonfunctional. This avoids penalizing newly opened facilities that weren’t expected to report in earlier months, or facilities that are permanently or temporarily closed and therefore are not expected to report.

Up-to-date master facility lists (MFL) that track facility type and activity status are very helpful for determining which health facilities should be included in the denominator for reporting rate of each indicator, for each reporting period. In the absence of an MFL, or official determination of activity status, it is still possible to infer which health facilities should be excluded.

For more guidance on how to consider active vs inactive facilities, please see the code library page Determining active and inactive status.

Code for excluding by specific facility type is included on this page.

Consult SNT team

Consult the SNT team to understand how to determine which facilities, if any, should not be included in the denominator for reporting rate. National practices vary, and the surveillance focal person on the SNT team should explain what would be appropriate for each indicator.

Calculating reporting rates

Once health facility activity status has been established, reporting rates can be calculated. These rates reflect the proportion of expected facilities that submitted valid data for a given indicator in a given time period.

For each indicator of interest, reporting rate is defined as:

\[ \text{Indicator Reporting Rate}_{a,t} = \frac{o_{a,t}}{e_{a,t}} \]

Where:

  • \(a\) is the administrative unit (e.g. chiefdom or district)
  • \(t\) is the time period (e.g. “2022-03”)
  • \(o_{a,t}\) is the number of observed facilities in unit \(a\) during time \(t\)
  • \(e_{a,t}\) is the number of expected facilities in unit \(a\) during time \(t\)

Observed facilities are those that submitted a valid (non-missing) value for the indicator of interest during time \(t\).

Expected facilities are those who were expected to report during time \(t\) (see previous section). Remember to consult with the SNT team to decide what rules determine whether a facility is expected to report.

Worked example

Suppose we are calculating the reporting rate for total confirmed cases for Kailahun District in March 2022.

  • There are 89 health facilities in Kailahun that have ever submitted data on any key malaria indicator
  • All 89 submitted their first report on or before March 2022, so they are assumed to be active and expected to report that month
  • Of these, 80 facilities reported a valid value for conf (total confirmed cases) in March 2022 → 80 are observed reporting
  • The other two do not have a valid value for total confirmed cases (they show NA in the database) for March 2022

The reporting rate is calculated as:

\[ \text{Reporting Rate for Total Confirmed Cases}_{\text{Kailahun}, \text{Mar 2022}} = \frac{80}{89} = 0.899 \]

Weighted reporting rates

For some SNT applications, a weighted reporting rate may be of interest. The weighted reporting rate is an estimate of the proportion of an indicator’s expected total counts in a given admin unit over a given time period that was reported into routine surveillance.

This means that if a non-reporting facility generally reports fewer confirmed cases than average in its admin unit, the weighted reporting rate for confirmed cases is higher than the unweighted reporting rate (fewer cases are missing). Conversely, if a non-reporting facility generally reports relatively many confirmed cases for its admin unit, the weighted reporting rate for confirmed cases is lower than the unweighted reporting rate (more cases are missing).

Calculating weighted reporting rates

PLACEHOLDER EXPLANATION BELOW. WOULD BE GREAT TO INCLUDE THE SIMPLE EXAMPLE FROM BEA’S SLIDE DECK incl the illustration if possible

The monthly weighted reporting rate for each health district was determined as follows. For each calendar month (January through December), health facility weights were calculated by dividing the health facility’s average number of malaria cases reported for that month, across all years of data, by the district sum of the average number of malaria cases reported for that month, across all years of data. For dates in which a health facility is inactive, the average number of malaria cases reported for that month-and-year pair would be 0, and the weights for the health facilities in that district for that date would be calculated as usual. The district monthly weighted reporting rate was then calculated by summing the weights of the active health facilities. This value captures the proportion of expected confirmed malaria cases that are reported at the district level each month by the active health facilities included in the HMIS database.

How do I know whether to use unweighted or weighted reporting rate?

To assess the performance of the routine surveillance system, the unweighted reporting rate is likely to be more appropriate.

To estimate an admin unit’s unreported confirmed cases (as an example indicator), the weighted reporting rate may be the more accurate option, as it will account for the size of the non-reporting facility. If you choose to use weighted reporting rates, best practice is to also calculate the unweighted reporting rates for the same indicator, compare the two outputs, and discuss with the SNT team.

WOULD BE GREAT TO INCLUDE IN THE STEP BY STEP EXAMPLE VISUALIZATIONS. BOTH SEBASTIAN’S LINE PLOT VERSION AND OUSMANE’S HEATMAPS

Consult SNT team

If you think the weighted reporting rate might be the better option, produce reporting rates for using both methods and present to the SNT team. Discuss with the SNT team to understand how, where, when, and why the two reporting rates are different. Together with the SNT team, discuss which to use for downstream analysis.

Step-by-Step

Now that we’ve defined how reporting rates are constructed—by identifying active facilities and calculating observed reporting—we move into the step-by-step process for implementing this in code using example DHIS2 data from Sierra Leone. In this section, we walk through the steps for calculating and visualising monthly reporting rates. Each step is designed to guide you through the process. Follow the notes in the code, especially where edits are required.

To skip the step-by-step explanation, jump to the full code at the end of this page.

Objectives
  • Calculate monthly reporting rates
  • Visualise reporting rate over time, by indicator

Step 1: Import relevant packages and pre-processed routine data

In this step, we load the necessary packages to run this section, as well as the routine DHIS2 dataset that was initially processed in the DHIS2 Data Preprocessing section of this code library.

  • R
  • Python
# Install pacman only if it's not already installed
if (!requireNamespace("pacman", quietly = TRUE)) {
  install.packages("pacman")}

# Install or load relevant packages
pacman::p_load(
  dplyr,        # Data manipulation
  tidyr,        # Additional data manipulation tools
  purrr,        # Functional programming tools
  here,         # Path management
  ggplot2,      # Plotting
  ggthemes,     # Plot themes and scales
  patchwork,    # Combining plots
  RColorBrewer, # Color palettes
  scales,       # Plot scales and formatting
  tibble,       # Modern data frames
  readr,        # Reading data files
  knitr,        # Formatted table outputs
  stringr)      # String manipulation

# Install/update and load sntutils
if (!requireNamespace("sntutils", quietly = TRUE)) {
  devtools::install_github("ahadi-analytics/sntutils", quiet = TRUE, upgrade = "always")
} else {
  devtools::install_github("ahadi-analytics/sntutils", quiet = TRUE, upgrade = "always")
}

library(sntutils)

dhis2_df <- read_csv(here::here(
     "1.2_epidemiology/1.2a_routine_surveillance/raw",
     "dhis2_processed_data_python.csv"))

knitr::kable(head(dhis2_df, 10))
Output
Year Month YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid hf hf_uid allout allout_ov5 allout_u5 conf conf_5_14 conf_com conf_com_5_14 conf_com_ov15 conf_com_u5 conf_hf conf_hf_5_14 conf_hf_ov15 conf_hf_u5 conf_ov15 conf_u5 maladm maladm_5_14 maladm_ov15 maladm_u5 maldth maldth_10_14 maldth_1_59m maldth_5_14 maldth_5_9 maldth_fem_ov15 maldth_mal_ov15 maldth_ov15 maldth_u5 maltreat maltreat_5_14 maltreat_com maltreat_com_5_14 maltreat_com_ov15 maltreat_com_u5 maltreat_hf maltreat_hf_5_14 maltreat_hf_ov15 maltreat_hf_u5 maltreat_ov15 maltreat_ov24_5_14_com maltreat_ov24_5_14_hf maltreat_ov24_com maltreat_ov24_hf maltreat_ov24_ov15_com maltreat_ov24_ov15_hf maltreat_ov24_total maltreat_ov24_u5_com maltreat_ov24_u5_hf maltreat_u24_5_14_com maltreat_u24_5_14_hf maltreat_u24_com maltreat_u24_hf maltreat_u24_ov15_com maltreat_u24_ov15_hf maltreat_u24_total maltreat_u24_u5_com maltreat_u24_u5_hf maltreat_u5 pres pres_5_14 pres_com pres_com_5_14 pres_com_ov15 pres_com_u5 pres_hf pres_hf_5_14 pres_hf_ov15 pres_hf_u5 pres_ov15 pres_u5 susp susp_5_14 susp_com_5_14 susp_com_ov15 susp_com_u5 susp_hf_5_14 susp_hf_ov15 susp_hf_u5 susp_ov15 susp_u5 test test_5_14 test_com test_com_5_14 test_com_ov15 test_com_u5 test_hf test_hf_5_14 test_hf_ov15 test_hf_u5 test_neg_mic_5_14_hf test_neg_mic_ov15_hf test_neg_mic_u5_hf test_neg_rdt_5_14_com test_neg_rdt_5_14_hf test_neg_rdt_ov15_com test_neg_rdt_ov15_hf test_neg_rdt_u5_com test_neg_rdt_u5_hf test_ov15 test_pos_mic_5_14_hf test_pos_mic_ov15_hf test_pos_mic_u5_hf test_pos_rdt_5_14_com test_pos_rdt_5_14_hf test_pos_rdt_ov15_com test_pos_rdt_ov15_hf test_pos_rdt_u5_com test_pos_rdt_u5_hf test_u5
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Aethel CHP HF_00001 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Agape Way CHP HF_00002 260 130 130 280 104 NA NA NA NA 280 104 40 136 40 136 NA NA NA NA NA NA NA NA NA NA NA NA NA 297 110 17 6 7 4 280 104 40 136 47 NA NA NA NA NA NA NA NA NA 6 104 17 280 7 40 297 4 136 140 0 0 NA NA NA NA 0 0 0 0 0 0 464 134 NA NA NA 134 60 270 60 270 464 134 NA NA NA NA 464 134 60 270 NA NA NA NA 30 NA 20 NA 134 60 NA NA NA NA 104 NA 40 NA 136 270
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Anglican Diocese Clinic HF_00003 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Batiama Layout MCHP HF_00004 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo Government Hospital HF_00005 784 309 475 NA NA NA NA NA NA NA NA NA NA NA NA 19 11 1 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 182 NA NA NA NA NA 14 168 14 168 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo School Bay CHP HF_00006 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Breakthrough MCHP HF_00007 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Brima Town CHP HF_00008 241 139 102 95 6 NA NA NA NA 95 6 29 60 29 60 NA NA NA NA NA NA NA NA NA NA NA NA NA 95 6 NA NA NA NA 95 6 29 60 29 NA 1 NA 34 NA 12 34 NA 21 NA 5 NA 61 NA 17 61 NA 39 60 0 0 NA NA NA NA 0 0 0 0 0 0 106 7 NA NA NA 7 30 69 30 69 106 7 NA NA NA NA 106 7 30 69 NA NA NA NA 1 NA 1 NA 9 30 NA NA NA NA 6 NA 29 NA 60 69
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 EDC Unit CHP HF_00009 583 217 366 316 17 NA NA NA NA 316 17 102 197 102 197 NA NA NA NA NA NA NA NA NA NA NA NA NA 316 17 NA NA NA NA 316 17 102 197 102 NA NA NA NA NA NA NA NA NA NA 17 NA 316 NA 102 316 NA 197 197 0 0 NA NA NA NA 0 0 0 0 0 0 528 27 NA NA NA 27 190 311 190 311 528 27 NA NA NA NA 528 27 190 311 NA NA NA NA 10 NA 88 NA 114 190 NA NA NA NA 17 NA 102 NA 197 311
2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Favour MCHP HF_00010 159 87 72 152 25 NA NA NA NA 152 25 62 65 62 65 NA NA NA NA NA NA NA NA NA NA NA NA NA 152 25 NA NA NA NA 152 25 62 65 62 NA NA NA NA NA NA NA NA NA NA 25 NA 152 NA 62 152 NA 65 65 0 0 NA NA NA NA 0 0 0 0 0 0 174 26 NA NA NA 26 69 79 69 79 174 26 NA NA NA NA 174 26 69 79 NA NA NA NA 1 NA 7 NA 14 69 NA NA NA NA 25 NA 62 NA 65 79

To adapt the code:

# Import packages
import pandas as pd
import numpy as np
from pyhere import here
import matplotlib.pyplot as plt
import seaborn as sns
from collections import OrderedDict
import matplotlib.patches as mpatches

# import dhis2 data data
dhis2_df = pd.read_parquet(here('english/data_r/routine_cases', 'dhis2_processed_data_python.parquet'))

# Inspect
dhis2_df.head(10).style
Output
  Year Month YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid hf hf_uid allout allout_ov5 allout_u5 conf conf_5_14 conf_com conf_com_5_14 conf_com_ov15 conf_com_u5 conf_hf conf_hf_5_14 conf_hf_ov15 conf_hf_u5 conf_ov15 conf_u5 maladm maladm_5_14 maladm_ov15 maladm_u5 maldth maldth_10_14 maldth_1_59m maldth_5_14 maldth_5_9 maldth_fem_ov15 maldth_mal_ov15 maldth_ov15 maldth_u5 maltreat maltreat_5_14 maltreat_com maltreat_com_5_14 maltreat_com_ov15 maltreat_com_u5 maltreat_hf maltreat_hf_5_14 maltreat_hf_ov15 maltreat_hf_u5 maltreat_ov15 maltreat_ov24_5_14_com maltreat_ov24_5_14_hf maltreat_ov24_com maltreat_ov24_hf maltreat_ov24_ov15_com maltreat_ov24_ov15_hf maltreat_ov24_total maltreat_ov24_u5_com maltreat_ov24_u5_hf maltreat_u24_5_14_com maltreat_u24_5_14_hf maltreat_u24_com maltreat_u24_hf maltreat_u24_ov15_com maltreat_u24_ov15_hf maltreat_u24_total maltreat_u24_u5_com maltreat_u24_u5_hf maltreat_u5 pres pres_5_14 pres_com pres_com_5_14 pres_com_ov15 pres_com_u5 pres_hf pres_hf_5_14 pres_hf_ov15 pres_hf_u5 pres_ov15 pres_u5 susp susp_5_14 susp_com_5_14 susp_com_ov15 susp_com_u5 susp_hf_5_14 susp_hf_ov15 susp_hf_u5 susp_ov15 susp_u5 test test_5_14 test_com test_com_5_14 test_com_ov15 test_com_u5 test_hf test_hf_5_14 test_hf_ov15 test_hf_u5 test_neg_mic_5_14_hf test_neg_mic_ov15_hf test_neg_mic_u5_hf test_neg_rdt_5_14_com test_neg_rdt_5_14_hf test_neg_rdt_ov15_com test_neg_rdt_ov15_hf test_neg_rdt_u5_com test_neg_rdt_u5_hf test_ov15 test_pos_mic_5_14_hf test_pos_mic_ov15_hf test_pos_mic_u5_hf test_pos_rdt_5_14_com test_pos_rdt_5_14_hf test_pos_rdt_ov15_com test_pos_rdt_ov15_hf test_pos_rdt_u5_com test_pos_rdt_u5_hf test_u5
0 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Aethel CHP HF_00001 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
1 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Agape Way CHP HF_00002 260.000000 130.000000 130.000000 280.000000 104.000000 nan nan nan nan 280.000000 104.000000 40.000000 136.000000 40.000000 136.000000 nan nan nan nan nan nan nan nan nan nan nan nan nan 297.000000 110.000000 17.000000 6.000000 7.000000 4.000000 280.000000 104.000000 40.000000 136.000000 47.000000 nan nan nan nan nan nan nan nan nan 6.000000 104.000000 17.000000 280.000000 7.000000 40.000000 297.000000 4.000000 136.000000 140.000000 0.000000 0.000000 nan nan nan nan 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 464.000000 134.000000 nan nan nan 134.000000 60.000000 270.000000 60.000000 270.000000 464.000000 134.000000 nan nan nan nan 464.000000 134.000000 60.000000 270.000000 nan nan nan nan 30.000000 nan 20.000000 nan 134.000000 60.000000 nan nan nan nan 104.000000 nan 40.000000 nan 136.000000 270.000000
2 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Anglican Diocese Clinic HF_00003 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
3 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Batiama Layout MCHP HF_00004 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
4 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo Government Hospital HF_00005 784.000000 309.000000 475.000000 nan nan nan nan nan nan nan nan nan nan nan nan 19.000000 11.000000 1.000000 7.000000 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 182.000000 nan nan nan nan nan 14.000000 168.000000 14.000000 168.000000 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
5 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo School Bay CHP HF_00006 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
6 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Breakthrough MCHP HF_00007 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
7 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Brima Town CHP HF_00008 241.000000 139.000000 102.000000 95.000000 6.000000 nan nan nan nan 95.000000 6.000000 29.000000 60.000000 29.000000 60.000000 nan nan nan nan nan nan nan nan nan nan nan nan nan 95.000000 6.000000 nan nan nan nan 95.000000 6.000000 29.000000 60.000000 29.000000 nan 1.000000 nan 34.000000 nan 12.000000 34.000000 nan 21.000000 nan 5.000000 nan 61.000000 nan 17.000000 61.000000 nan 39.000000 60.000000 0.000000 0.000000 nan nan nan nan 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 106.000000 7.000000 nan nan nan 7.000000 30.000000 69.000000 30.000000 69.000000 106.000000 7.000000 nan nan nan nan 106.000000 7.000000 30.000000 69.000000 nan nan nan nan 1.000000 nan 1.000000 nan 9.000000 30.000000 nan nan nan nan 6.000000 nan 29.000000 nan 60.000000 69.000000
8 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 EDC Unit CHP HF_00009 583.000000 217.000000 366.000000 316.000000 17.000000 nan nan nan nan 316.000000 17.000000 102.000000 197.000000 102.000000 197.000000 nan nan nan nan nan nan nan nan nan nan nan nan nan 316.000000 17.000000 nan nan nan nan 316.000000 17.000000 102.000000 197.000000 102.000000 nan nan nan nan nan nan nan nan nan nan 17.000000 nan 316.000000 nan 102.000000 316.000000 nan 197.000000 197.000000 0.000000 0.000000 nan nan nan nan 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 528.000000 27.000000 nan nan nan 27.000000 190.000000 311.000000 190.000000 311.000000 528.000000 27.000000 nan nan nan nan 528.000000 27.000000 190.000000 311.000000 nan nan nan nan 10.000000 nan 88.000000 nan 114.000000 190.000000 nan nan nan nan 17.000000 nan 102.000000 nan 197.000000 311.000000
9 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Favour MCHP HF_00010 159.000000 87.000000 72.000000 152.000000 25.000000 nan nan nan nan 152.000000 25.000000 62.000000 65.000000 62.000000 65.000000 nan nan nan nan nan nan nan nan nan nan nan nan nan 152.000000 25.000000 nan nan nan nan 152.000000 25.000000 62.000000 65.000000 62.000000 nan nan nan nan nan nan nan nan nan nan 25.000000 nan 152.000000 nan 62.000000 152.000000 nan 65.000000 65.000000 0.000000 0.000000 nan nan nan nan 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 174.000000 26.000000 nan nan nan 26.000000 69.000000 79.000000 69.000000 79.000000 174.000000 26.000000 nan nan nan nan 174.000000 26.000000 69.000000 79.000000 nan nan nan nan 1.000000 nan 7.000000 nan 14.000000 69.000000 nan nan nan nan 25.000000 nan 62.000000 nan 65.000000 79.000000

To adapt the code: Edit the filepath on line 11.

Step 2: Import the number of expected reports per reporting period

Here we import the expected reports dataframe we built in in our analysis of active and inactive health facilities. This dataframe contains, for each reporting period (month, in our example), the number of health facilities expected to report.

Consult SNT team

Make sure that when using a calculated number of expected reports, that the method of calculation has been validated by the SNT team. The set of indicators used to define reporting and non-reporting status should also be clearly stated and validated by the SNT team.

Depending on the indicator set for defining reporting status, and the indicator of interest to evaluate reporting rate, you may need to use different dataframes of expected reports for the reporting rate denominator.

This data corresponds to the denominator when calculating the reporting rate. We import it here to avoid having to recalculate the number of expected reports everytime we need to calculate the reporting rate.

In following steps, we will calculate the number of observed reports (the numerator for the reporting rate), merge the two datasets and then calculate the reporting rate.

  • R
  • Python
df_expected <- read_csv(here::here(
     "english/data_r/routine_cases", "df_expected.csv"))

knitr::kable(head(df_expected, 10))
Output
Year YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid denominator
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 21
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Badjia Chiefdom adm3_00002 2
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bagbwe Chiefdom adm3_00003 6
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Baoma Chiefdom adm3_00004 16
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bargbo Chiefdom adm3_00005 8
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bongor Chiefdom adm3_00006 4
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bumpe Ngao Chiefdom adm3_00007 13
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Gbo Chiefdom adm3_00008 2
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Jaiama Chiefdom adm3_00009 3
2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Kakua Chiefdom adm3_00010 8

To adapt the code:

# import denominator data
df_expected = pd.read_csv(here('english/data_r/routine_cases', 'df_expected.csv'))

# Inspect
df_expected.head(10).style
Output
  Year YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid denominator
0 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 21
1 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Badjia Chiefdom adm3_00002 2
2 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bagbwe Chiefdom adm3_00003 6
3 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Baoma Chiefdom adm3_00004 16
4 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bargbo Chiefdom adm3_00005 8
5 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bongor Chiefdom adm3_00006 4
6 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bumpe Ngao Chiefdom adm3_00007 13
7 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Gbo Chiefdom adm3_00008 2
8 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Jaiama Chiefdom adm3_00009 3
9 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Kakua Chiefdom adm3_00010 8

To adapt the code: Edit the filepath on line 2.

Step 3: Define function to calculate reporting rate

Now we define a function to calculate the monthly reporting rate, for a given indicator, at a given admin unit level. The function performs the following steps:

  • Count the number of non-NA reports for the indicator of interest (observed reports), aggregated by month and by the admin unit level requested by the user
  • Count the number of expected reports by month and admin uit unit level requested by the user, by aggregating the dataframe of expected reports
  • Merge the two datasets (observed reports and expected reports, i.e. reporting rate numerator and denominator)
  • Compute the reporting rate
  • R
  • Python
compute_RR <- function(base_df, df_expected, level, indicator) {
  cols_group <- c('YM', paste0('adm', level), paste0('adm', level, '_uid'))

  # count number of non-null reports for confirmed cases
  df <- base_df |>
    dplyr::group_by(dplyr::across(dplyr::all_of(cols_group))) |>
    dplyr::summarise(observed = sum(!is.na(.data[[indicator]])), .groups = "drop")

  # aggregate denominator at this level
  temp <- df_expected |>
    dplyr::group_by(dplyr::across(dplyr::all_of(cols_group))) |>
    dplyr::summarise(expected = sum(denominator, na.rm = TRUE), .groups = "drop")

  # merge numerator and denominator
  df <- df |>
    dplyr::full_join(temp, by = cols_group) |>
    dplyr::distinct()  # equivalent to validate = '1:1'

  # compute reporting rate
  df <- df |>
    dplyr::mutate(
      "{indicator}_RR" := dplyr::if_else(
        expected == 0,
        NA_real_,
        observed / expected
      )
    )

  # select and output dataframe
  df <- df |>
    dplyr::select(
      YM,
      paste0('adm', level),
      paste0('adm', level, '_uid'),
      observed,
      expected,
      paste0(indicator, '_RR')
    )

  return(df)
}

To adapt the code:

def compute_RR(base_df, df_expected, level, indicator):
    cols_group = ['YM', f'adm{level}', f'adm{level}_uid']
    df = base_df.copy()

    # count number of non-null reports for confirmed cases
    df = df.groupby(cols_group)[indicator].count().reset_index()

    # aggregate denominator at this level
    temp = (df_expected.groupby(cols_group)['denominator']
            .sum(min_count = 1)
            .reset_index())

    # merge numerator and denominator
    df = df.merge(temp, on = cols_group, how = 'outer', validate = '1:1')

    # compute reporting rate
    df.insert(len(df.columns), f'{indicator}_RR', np.where(df['denominator'] == 0, np.nan, df[indicator].div(df['denominator'])))

    # rename columns and output dataframe
    df = df.rename(columns = {indicator: 'observed', 'denominator': 'expected'})
    df = df[['YM', f'adm{level}', f'adm{level}_uid', 'observed', 'expected', f'{indicator}_RR']]

    return df

To adapt the code: Nothing to adapt here.

Step 4: Calculate reporting rate for a given indicator, at a given admin unit level

Step 4.1: Unweighted reporting rate

We now use the function defined in Step 3 above, to calculate the reporting rate. We pass two arguments ot the function:

  • the indicator we are calculating reporting rate for;
  • the admin unit level we are performing the calculation at.

The resulting dataframe is saved to file if needed.

  • R
  • Python
# define indicator and admin unit level
level <- 3
indicator <- 'conf'

# calculate reporting rate
df <- compute_RR(dhis2_df, df_expected, level, indicator)

# Save
readr::write_csv(df, here::here('english', 'data_r', 'routine_cases', paste0('Monthly_', indicator, '_RR_adm', level, '.csv')))

# Inspect results
knitr::kable(utils::head(df, 10))
Output
YM adm3 adm3_uid observed expected conf_RR
2015-01 Badjia Chiefdom adm3_00002 2 2 1.0000000
2015-01 Bagbwe Chiefdom adm3_00003 6 6 1.0000000
2015-01 Bake-Loko Chiefdom adm3_00097 2 3 0.6666667
2015-01 Baoma Chiefdom adm3_00004 13 16 0.8125000
2015-01 Barawa Wollay Chiefdom adm3_00030 1 1 1.0000000
2015-01 Bargbo Chiefdom adm3_00005 8 8 1.0000000
2015-01 Barri Chiefdom adm3_00110 8 9 0.8888889
2015-01 Bendu-Cha Chiefdom adm3_00018 3 3 1.0000000
2015-01 Bo City adm3_00001 20 21 0.9523810
2015-01 Bongor Chiefdom adm3_00006 4 4 1.0000000
Alternative code option using sntutils package
# calculate reporting rate using sntutils
df <- sntutils::calculate_reporting_metrics(
  data = dhis2_df,
  vars_of_interest = indicator,
  x_var = "YM",
  y_var = paste0("adm", level),
  hf_col = "hf_uid",
  key_indicators = c("allout", "conf", "test", "treat", "pres"),
  method = 2  # uses method 2 (first-to-last) for facility activity classification
)

To adapt the code:

# define indicator and admin unit level
level = 3
indicator = 'conf'

# calculate reporting rate
df = compute_RR(dhis2_df, df_expected, level, indicator)

# Save
df.to_csv(here('english/data_r/routine_cases', f'Monthly_{indicator}_RR_adm{level}.csv'), index = None)

# Inspect results
df.head(10).style
Output
  YM adm3 adm3_uid observed expected conf_RR
0 2015-01 Badjia Chiefdom adm3_00002 2 2 1.000000
1 2015-01 Bagbwe Chiefdom adm3_00003 6 6 1.000000
2 2015-01 Bake-Loko Chiefdom adm3_00097 2 3 0.666667
3 2015-01 Baoma Chiefdom adm3_00004 13 16 0.812500
4 2015-01 Barawa Wollay Chiefdom adm3_00030 1 1 1.000000
5 2015-01 Bargbo Chiefdom adm3_00005 8 8 1.000000
6 2015-01 Barri Chiefdom adm3_00110 8 9 0.888889
7 2015-01 Bendu-Cha Chiefdom adm3_00018 3 3 1.000000
8 2015-01 Bo City adm3_00001 20 21 0.952381
9 2015-01 Bongor Chiefdom adm3_00006 4 4 1.000000

To adapt the code: Set the parameters on lines 2 and 3, and adapt the filepath on line 9.

Step 4.2: Weighted reporting rate

Weighted reporting rates account for the relative size of each facility when calculating reporting completeness. Unlike unweighted rates that treat all facilities equally, weighted rates give more influence to facilities that typically report higher case numbers. This provides a better estimate of what proportion of total expected cases were actually reported.

  • R
  • Python
Output
Alternative code option using sntutils package
level <- 3
indicator <- 'conf'

# calculate weighted reporting rate using sntutils
df_weighted <- sntutils::calculate_reporting_metrics(
  data = dhis2_df,
  vars_of_interest = indicator,
  x_var = "YM",
  y_var = paste0("adm", level),
  hf_col = "hf_uid",
  key_indicators = c("allout", "conf", "test", "treat", "pres"),
  method = 2,
  weighting = TRUE,
  weight_var = "test",
  weight_window = 12,     # 12-month rolling window for typical facility size
  exclude_current_x = TRUE, # exclude current period from weight calculation
  cold_start = "median_within_y" # use median within admin unit for new facilities
)

To adapt the code:

Output

To adapt the code: Set the parameters on lines 2 and 3, and adapt the filepath on line 9.

Step 4.3: Compare unweighted and weighted reporting rates

Weighted reporting rates account for the relative size of each facility when calculating reporting completeness. Unlike unweighted rates that treat all facilities equally, weighted rates give more influence to facilities that typically report higher case numbers. This provides a better estimate of what proportion of total expected cases were actually reported.

  • R
  • Python
Show the code
# Compare weighted vs unweighted results
df_comparison <- df |>
  dplyr::select(YM, adm3, reprate) |>
  dplyr::left_join(
    df_weighted |> dplyr::select(YM, adm3, reprate_w),
    by = c("YM", "adm3")
  ) |>
  dplyr::mutate(
    difference = reprate_w - reprate,
    abs_difference = abs(difference),
    weighted_higher = reprate_w > reprate  # ADD THIS LINE - creates the missing column
  )

# Inspect comparison
cat("Comparison of first 10 rows:\n")
knitr::kable(utils::head(df_comparison, 10))

# Summary of differences
comparison_summary <- df_comparison |>
  dplyr::filter(!is.na(difference)) |>
  dplyr::summarise(
    n_comparisons = dplyr::n(),
    mean_diff = mean(difference),
    median_diff = median(difference),
    max_abs_diff = max(abs_difference),
    prop_weighted_higher = mean(weighted_higher, na.rm = TRUE)  # Now this will work
  )

cat("\nWeighted vs Unweighted Comparison Summary:\n")
print(comparison_summary)

# Show examples with largest differences
largest_differences <- df_comparison |>
  dplyr::arrange(desc(abs_difference)) |>
  head(10)

cat("\nLargest differences between weighted and unweighted:\n")
knitr::kable(largest_differences)
Output
Output

Step 5: Visualise mulitple indicator-specific reporting rates at the national level

Step 5.1 Multiple indicator heatmap at the national level

Here we produce a visual to display the monthly reporting rate for a list of indicators, at national level. The first visual is a heatmap, convenient for quickly comparing reporting rate over time (x-axis) and between indicators (y-axis).

  • R
  • Python
Show the code
# Make a heatmap of monthly reporting rate by variable
indicators <- c('allout', 'test', 'conf', 'pres', 'maltreat')
level <- 0

# Initialize dataframe and merge all indicators
df <- tibble::tibble(YM = character())
for (i in indicators) {
  t <- compute_RR(dhis2_df, df_expected, level, i) |>
    dplyr::select(-dplyr::all_of(c(paste0('adm', level), paste0('adm', level, '_uid'), 'observed', 'expected'))) |>
    dplyr::mutate("{i}_RR" := .data[[paste0(i, '_RR')]] * 100)

  df <- df |>
    dplyr::full_join(t, by = 'YM')
}

# Rearrange for heatmap (transpose and set YM as row names)
df_heatmap <- df |>
  tibble::column_to_rownames('YM') |>
  t() |>
  as.data.frame()

# Get every other month for x-axis labels
ym_labels <- names(df_heatmap)
every_other_ym <- ym_labels[seq(1, length(ym_labels), by = 3)]

# Create heatmap
# Create heatmap
heatmap_plot <- ggplot2::ggplot(
  data = df_heatmap |>
    tibble::rownames_to_column('Indicator') |>
    tidyr::pivot_longer(
      cols = -Indicator,
      names_to = 'YM',
      values_to = 'RR'
    ) |>
    dplyr::mutate(Indicator = stringr::str_remove(Indicator, '_RR')) |>
    dplyr::mutate(Indicator = factor(Indicator, levels = c('test', 'pres', 'maltreat', 'conf', 'allout'))),
  ggplot2::aes(x = YM, y = Indicator, fill = RR)
) +
  ggplot2::geom_tile() +
  ggplot2::scale_fill_viridis_c(
    name = '%',
    limits = c(0, 100),
    na.value = 'white'
  ) +
  ggplot2::scale_x_discrete(
    breaks = every_other_ym  # Show only every other month
  ) +
  ggplot2::labs(
    x = '',
    y = '',
    title = 'Monthly reporting rate'
  ) +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.x = ggplot2::element_text(angle = 45, hjust = 1),
    axis.text.y = ggplot2::element_text(hjust = 1),
    plot.title = ggplot2::element_text(hjust = 0.5),
    legend.position = 'right'
  )

# Save plot
ggplot2::ggsave(
  here::here('english', 'data_r', 'routine_cases', paste0('Monthly_reporting_rate_by_variable_heatmap.png')),
  plot = heatmap_plot,
  width = 15,
  height = 6,
  bg = 'white'
)
Output

Alternative code option using sntutils package
# Make a heatmap of monthly reporting rate by variable using sntutils
sntutils::reporting_rate_plot(
  data = dhis2_df,
  x_var = "YM",
  vars_of_interest = c('allout', 'test', 'conf', 'pres', 'maltreat'),
  use_reprate = TRUE,
  plot_path = here::here('english', 'data_r', 'routine_cases'),
  plot_width = 15,
  plot_height = 6
)

To adapt the code:

Show the code
# Make a heatmap of monthly reporting rate by variable
indicators =  ['allout', 'test', 'conf', 'pres', 'maltreat']
level = 0

df = pd.DataFrame(columns = ['YM'])
for i in indicators:
    t = compute_RR(dhis2_df, df_expected, level, i).drop([f'adm{level}', f'adm{level}_uid', 'observed', 'expected'], axis = 1)
    # make RR a percentage for visual
    t[f'{i}_RR'] = t[f'{i}_RR']*100
    df = df.merge(t, on = 'YM', how = 'outer', validate = '1:1')

# rearrange for heatmap
df = df.set_index('YM').T

fig, ax = plt.subplots(figsize = (15, 6))
cbar_ax = fig.add_axes([.91, .2, .02, .6])

sns.heatmap(ax = ax
            , data = df
            , cmap = 'viridis'
            , cbar_ax = cbar_ax
            , vmin = 0
            , vmax = 100
            , cbar_kws = {'label': '%'})

ax.set_xlabel('')
ax.set_xticks(ax.get_xticks())
ax.set_xticklabels(ax.get_xticklabels(), rotation = 45, ha = 'right')
ax.set_yticks(ax.get_yticks())
ax.set_yticklabels([l.get_text()[0:-3] for l in ax.get_yticklabels()], rotation = 0, ha = 'right')

ax.set_title('Monthly reporting rate')

fig.savefig(here('english/data_r/routine_cases', f'Monthly_reporting_rate_by_variable_heatmap.png')
            , facecolor = 'white'
            , bbox_inches = 'tight')

plt.show()
Output

To adapt the code:

Step 5.2 Multiple indicator line plot at the national level

The second visual is a line plot, another more traditional way of quickly comparing reporting rate (y-axis) over time (x-axis) and between indicators (different colours).

  • R
  • Python
Show the code
# Make a line plot of monthly reporting rate by variable
indicators <- c('allout', 'test', 'conf', 'pres', 'maltreat')

# Create color palette for each indicator
indicator_colors <- c('#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd')  # matplotlib default colors
names(indicator_colors) <- indicators

# Prepare data for plotting
plot_data <- purrr::map_dfr(indicators, ~ {
  t <- compute_RR(dhis2_df, df_expected, 0, .x)
  t |>
    dplyr::select(YM, !!paste0(.x, '_RR')) |>
    dplyr::mutate(indicator = .x, value = .data[[paste0(.x, '_RR')]]) |>
    dplyr::select(YM, indicator, value)
})

# Get every other YM for x-axis labels
ym_levels <- unique(plot_data$YM)
every_other_ym <- ym_levels[seq(1, length(ym_levels), by = 2)]

# Create the plot
line_plot <- plot_data |>
  ggplot2::ggplot(ggplot2::aes(x = YM, y = value, color = indicator, group = indicator)) +
  ggplot2::geom_line(linewidth = 1) +
  ggplot2::scale_color_manual(values = indicator_colors) +
  ggplot2::scale_x_discrete(breaks = every_other_ym) +  # Fewer x-axis labels
  ggplot2::coord_cartesian(ylim = c(0, 1)) +
  ggplot2::labs(
    x = '',
    y = 'Reporting rate',
    title = 'Monthly reporting rate, Sierra Leone',
    color = 'Indicator'
  ) +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    plot.title = ggplot2::element_text(hjust = 0.5),
    axis.text.x = ggplot2::element_text(angle = 45, hjust = 1),
    legend.position = 'bottom'
  )

# Save plot
ggplot2::ggsave(
  here::here('english', 'data_r', 'routine_cases', 'Monthly_reporting_rate_line_plot.png'),
  plot = line_plot,
  width = 6,
  height = 4,
  bg = 'white'
)
Output

To adapt the code:

Show the code
# Make a line plot  of monthly reporting rate by variable
indicators =  ['allout', 'test', 'conf', 'pres', 'maltreat']

fig, ax = plt.subplots(figsize = (6, 4))
for ind in indicators:
    t = compute_RR(dhis2_df, df_expected, 0, ind)
    t.plot(ax = ax, x = 'YM', y = f'{ind}_RR')

ax.set_ylim(0, 1)
ax.set_xlabel('')
ax.set_ylabel('Reporting rate')
ax.set_title('Monthly reporting rate, Sierra Leone')

fig.tight_layout()
Output
(0.0, 1.0)

To adapt the code: Specify your list of indicators on line 1.

Step 6: Visualise single indicator reporting rates by admin unit

Step 6.1 Single indicator heatmap by admin unit

Here we focus on displaying the reporting rate for a single indicator, but we disaggregate by admin unit. First we use the heatmap format, which now allows for easy comparison over time (x-axis) and between admin units (y-axis).

  • R
  • Python
Show the code
# VT to clean this up after agreeing on import.qmd with team
dftree <- dhis2_df |>
  dplyr::select(adm0, adm0_uid, adm1, adm1_uid, adm2, adm2_uid, adm3, adm3_uid) |>
  dplyr::distinct() |>
  dplyr::mutate(index = dplyr::row_number())

level <- 3
level_yaxis <- 1
indicator <- 'conf'
fs <- 15

df <- compute_RR(dhis2_df, df_expected, level, indicator) |>
  dplyr::mutate("{indicator}_RR" := .data[[paste0(indicator, '_RR')]] * 100)

# Pivot to wide format (equivalent to Python's pivot)
df_wide <- df |>
  tidyr::pivot_wider(
    names_from = YM,
    values_from = paste0(indicator, '_RR'),
    id_cols = paste0('adm', level)
  )

# Merge with adm1 and adm2 information
t <- dftree |>
  dplyr::select(paste0('adm', level_yaxis), adm2, paste0('adm', level)) |>
  dplyr::distinct()

df_merged <- df_wide |>
  dplyr::left_join(t, by = paste0('adm', level), relationship = "many-to-one") |>
  dplyr::arrange(.data[[paste0('adm', level_yaxis)]], .data[[paste0('adm', level)]]) |>
  tibble::column_to_rownames(paste0('adm', level))

# Extract the numeric data for heatmap
heatmap_data <- df_merged |>
  dplyr::select(-dplyr::all_of(c(paste0('adm', level_yaxis), "adm2"))) |>
  as.matrix() |>
  apply(2, as.numeric)

# Get adm1 groups for labeling and gaps
adm1_groups <- df_merged[[paste0('adm', level_yaxis)]]
adm2_names <- df_merged$adm2  # Get district names

# Use adm2 (district names) as row labels
row_labels <- df_merged$adm2  # These are district names

# Get the adm1 groups for the labeling trick
unique_adm1 <- unique(adm1_groups)

# Calculate gaps between adm1 groups
adm1_counts <- table(adm1_groups)
hline_positions <- cumsum(adm1_counts)[-length(adm1_counts)]

# Show district name only once per region group
for (adm1 in unique_adm1) {
  adm1_indices <- which(adm1_groups == adm1)
  if (length(adm1_indices) > 0) {
    # Only label the first row in the group with district name
    row_labels[adm1_indices[1]] <- row_labels[adm1_indices[1]]  # Just the district name
    # Clear labels for other rows in the same district group
    if (length(adm1_indices) > 1) {
      row_labels[adm1_indices[2:length(adm1_indices)]] <- ""
    }
  }
}

# Create heatmap
heatmap_plot <- pheatmap::pheatmap(
  heatmap_data,
  color = viridis::viridis(100),
  breaks = seq(0, 100, length.out = 101),
  cluster_rows = FALSE,
  cluster_cols = FALSE,
  display_numbers = FALSE,
  legend = TRUE,
  main = paste('Monthly', indicator, 'reporting rate by adm', level),
  angle_col = 45,
  fontsize = fs,
  fontsize_row = fs - 5,
  fontsize_col = fs -5,
  gaps_row = hline_positions,
  labels_row = row_labels,
  treeheight_row = 0,
  treeheight_col = 0,
  legend = TRUE,
  na_col = 'grey',
  silent = FALSE
)

# Save plot
ggplot2::ggsave(
  here::here('english', 'data_r', 'routine_cases', paste0('Monthly_', indicator, '_RR_adm', level, '_heatmap.png')),
  plot = heatmap_plot,
  width = 15,
  height = 6,
  bg = 'white'
)
Output

Alternative code option using sntutils package
# Create admin unit heatmap using sntutils
level <- 3
indicator <- 'conf'

sntutils::reporting_rate_plot(
  data = dhis2_df,
  x_var = "YM",
  y_var = paste0("adm", level),
  vars_of_interest = indicator,
  hf_col = "hf_uid",
  use_reprate = TRUE,
  plot_path = here::here('english', 'data_r', 'routine_cases'),
  plot_width = 15,
  plot_height = 10
)

To adapt the code:

Show the code
# VT to clean this up after agreeing on import.qmd with team
dftree = dhis2_df[['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid']].drop_duplicates().reset_index(drop = True)

level = 3
level_yaxis = 1
indicator = 'conf'
fs = 15

df = compute_RR(dhis2_df, df_expected, level, indicator)
df[f'{indicator}_RR'] = df[f'{indicator}_RR']*100

# rearrange for heatmap
df = df.pivot(index = [f'adm{level}'], columns = 'YM', values = f'{indicator}_RR')
t = dftree[[f'adm{level_yaxis}', f'adm{level}']].drop_duplicates()
df = df.merge(t, on = f'adm{level}', how = 'left', validate = 'm:1')

df = (df.sort_values(by = [f'adm{level_yaxis}', f'adm{level}'])
      .reset_index(drop= True)
      .set_index([f'adm{level_yaxis}', f'adm{level}']))


fig, ax = plt.subplots(figsize = (30, 15))
cbar_ax = fig.add_axes([.91, .2, .02, .6])

sns.heatmap(ax = ax
            , data = df
            , cmap = 'viridis'
            , cbar_ax = cbar_ax
            , vmin = 0
            , vmax = 100
            , cbar_kws = {'label': '%'})


_ = ax.set_xlabel('')
_ = ax.set_xticks(ax.get_xticks())
_ = ax.set_xticklabels(ax.get_xticklabels(), rotation = 45, ha = 'right')

# trick to label adm unit names nicely on the yaxis
yticklabels = [l.get_text().split('-')[0] for l in ax.get_yticklabels()]
t = pd.DataFrame(yticklabels).reset_index()
t1 = t.groupby(0)['index'].mean().astype(int).reset_index()
t.insert(0, 'pos', t[0].map(dict(zip(t1[0], t1['index']))))
t = t[[0, 'pos', 'index']]
t[0] = np.where(t.pos == t['index'], t[0], '')
test = t[0].to_list()

_ = ax.set_yticks(ax.get_yticks())
_ = ax.set_yticklabels(test, size = fs)
_ = ax.set_ylabel(f'adm{level}', size = fs)

ylabel_mapping = OrderedDict()

for adm1, adm3_uid in df.index:
    ylabel_mapping.setdefault(adm1, [])
    ylabel_mapping[adm1].append(adm3_uid)

hline = []
new_ylabels = []

for adm1, adm3_list in ylabel_mapping.items():
    adm3_list[0] = "{} - {}".format(adm1, adm3_list[0])
    new_ylabels.extend(adm3_list)

    if hline:
        hline.append(len(adm3_list) + hline[-1])
    else:
        hline.append(len(adm3_list))

ax.hlines(hline, xmin = -10, xmax = 0, color = "grey", linewidth = 2, clip_on = False)

# color NaN values
colour = 'grey'
ax.set_facecolor(colour);
handle = [mpatches.Patch(color = colour, label = 'No data')]
ax.legend(handles = handle
          , fontsize = fs
      , bbox_to_anchor = (1, 0)
      , loc = 'lower left')


ax.set_title(f'Monthly {indicator} reporting rate by adm{level}', size = fs);

plt.show()
Output

To adapt the code:

Step 5.2: Single indicator line plot by admin unit

Now we use the more traditonal line plot format. We specifcy am indicator, a district (here adm2) and plot the reporting rate for each chiefdom (here adm3).

  • R
  • Python
Show the code
indicator <- 'test'
level <- 3
adm2 <- 'Western Area Rural District Council'

# Compute reporting rate and merge with administrative tree
df <- compute_RR(dhis2_df, df_expected, level, indicator)
dftree <- dhis2_df |>
  dplyr::select(adm2, adm2_uid, adm3_uid) |>
  dplyr::distinct() |>
  dplyr::mutate(index = dplyr::row_number())
df <- df |>
  dplyr::left_join(dftree, by = 'adm3_uid', relationship = "many-to-one") |>
  dplyr::filter(adm2 == !!adm2)

# Get unique adm3 values for coloring
adm3_list <- unique(df$adm3)

# Use standard matplotlib colors
adm3_colors <- c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd",
                 "#8c564b", "#e377c2", "#7f7f7f", "#bcbd22", "#17becf")
names(adm3_colors) <- adm3_list[1:length(adm3_colors)]

# Prepare data for plotting
plot_data <- purrr::map_dfr(adm3_list, ~ {
  df |>
    dplyr::filter(adm3 == .x) |>
    dplyr::select(YM, !!paste0(indicator, '_RR')) |>
    dplyr::mutate(adm3 = .x, value = .data[[paste0(indicator, '_RR')]]) |>
    dplyr::select(YM, adm3, value)
})

# Get every other YM for x-axis labels
ym_levels <- unique(plot_data$YM)
every_other_ym <- ym_levels[seq(1, length(ym_levels), by = 8)]

# Create the plot
line_plot <- plot_data |>
  ggplot2::ggplot(ggplot2::aes(x = YM, y = value, color = adm3, group = adm3)) +
  ggplot2::geom_line(linewidth = 1) +
  ggplot2::scale_color_manual(values = adm3_colors) +
  ggplot2::scale_x_discrete(breaks = every_other_ym) +  # Add this line
  ggplot2::coord_cartesian(ylim = c(0, 1)) +
  ggplot2::labs(
    x = '',
    y = 'Reporting rate',
    title = paste('Monthly', indicator, 'reporting rate\n', adm2),
    color = 'Administrative Unit 3'
  ) +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    plot.title = ggplot2::element_text(hjust = 0.5),
    legend.position = 'bottom',
    legend.title = ggplot2::element_blank()
  )

# Apply tight layout equivalent
line_plot <- line_plot + ggplot2::theme(plot.margin = ggplot2::unit(c(1, 1, 1, 1), "cm"))

# Save plot
ggplot2::ggsave(
  here::here('english', 'data_r', 'routine_cases', paste0('Monthly_', indicator, '_RR_adm2_', stringr::str_replace_all(adm2, ' ', '_'), '.png')),
  plot = line_plot,
  width = 6,
  height = 4,
  bg = 'white'
)
Output

To adapt the code:

Show the code
indicator =  'test'
level = 3
adm2 = 'Western Area Rural District Council'

fig, ax = plt.subplots(figsize = (6, 4))

df = compute_RR(dhis2_df, df_expected, level, indicator)
dftree = dhis2_df[['adm2', 'adm2_uid', 'adm3_uid']].drop_duplicates().reset_index(drop = True)
df = df.merge(dftree, on = 'adm3_uid', how = 'left', validate = 'm:1')
df = df[df['adm2'] == adm2].copy()
for adm3 in df['adm3'].unique():
    t = df[df['adm3'] == adm3]
    t.plot(ax = ax, x = 'YM', y = f'{indicator}_RR', label = adm3)

ax.set_ylim(0, 1.1)
ax.set_xlabel('')
ax.set_ylabel('Reporting rate')
ax.set_title(f'Monthly {indicator} reporting rate\n{adm2}')

fig.tight_layout()
plt.show()
Output

To adapt the code: Specify your indicator, level for reporting rate calculation and district (adm2) in lines 1, 2, and 3.

Summary

This page has covered how to calculate, visualize, and interpret reporting rates. By establishing clear denominators through expected facility counts and calculating indicator-specific reporting completeness, we can identify geographic and temporal patterns in data reporting. The visualization approaches help communicate patterns effectively to SNT teams. Validated reporting rates form the foundation for reliable burden estimation, stratification, and intervention targeting in subsequent SNT workflow steps.

Full code

Find the full code script for calculating reporting rates below.

  • R
  • Python
 

©2025 Applied Health Analytics for Delivery and Innovation. All rights reserved