Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 2. Data Assembly and Management
  2. 2.3 Routine Surveillance Data
  3. Determining active and inactive status
  • Code library for subnational tailoring
    English version
  • 1. Getting Started
    • 1.1 About and Contact Information
    • 1.2 For Everyone
    • 1.3 For the SNT Team
    • 1.4 For Analysts
    • 1.5 Producing High-Quality Outputs
  • 2. Data Assembly and Management
    • 2.1 Working with Shapefiles
      • Spatial data overview
      • Basic shapefile use and visualization
      • Shapefile management and customization
      • Merging shapefiles with tabular data
    • 2.2 Health Facilities Data
      • Fuzzy matching of names across datasets
      • Health facility coordinates and point data
    • 2.3 Routine Surveillance Data
      • Routine data extraction
      • DHIS2 data preprocessing
      • Determining active and inactive status
      • Contextual considerations
      • Missing data detection methods
      • Health facility reporting rate
      • Data coherency checks
      • Outlier detection methods
      • Imputation methods
      • Final database
    • 2.4 Stock Data
      • LMIS
    • 2.5 Population Data
      • National population data
      • WorldPop population raster
    • 2.6 National Household Survey Data
      • DHS data overview and preparation
      • Prevalence of malaria infection
      • All-cause child mortality
      • Treatment-seeking rates
      • ITN ownership, access, and usage
      • Wealth quintiles analysis
    • 2.7 Entomological Data
      • Entomological data
    • 2.8 Climate and Environmental Data
      • Climate and environment data extraction from raster
    • 2.9 Modeled Data
      • Generating spatial modeled estimates
      • Working with geospatial model estimates
      • Modeled estimates of malaria mortality and proxies
      • Modeled estimates of entomological indicators
  • 3. Stratification
    • 3.1 Epidemiological Stratification
      • Incidence overview and crude incidence
      • Incidence adjustment 1: incomplete testing
      • Incidence adjustment 2: incomplete reporting
      • Incidence adjustment 3: treatment-seeking
      • Incidence stratification
      • Prevalence and mortality stratification
      • Combined risk categorization
      • Risk categorization REMOVE?
      • Risk categorization REMOVE?
    • 3.2 Stratification of Determinants of Malaria Transmission
      • Seasonality
      • Access to Care
  • 4. Review of Past Interventions
    • 4.1 Case Management
    • 4.2 Routine Interventions
    • 4.3 Campaign Interventions
    • 4.4 Other Interventions
  • 5. Targeting of Interventions
  • 6. Retrospective Analysis
    • 6.1: Trend analysis
  • 7. Urban Microstratification

On this page

  • Overview
  • Key concepts in defining active facilities
  • Methods for determining active and inactive status of health facilities from reporting status
    • Method 1: Permanent activation
    • Method 2: Activate after first report, inactivate after last report
    • Method 3: Dynamic activation and inactivation
    • Method Summary
  • Step-by-step
    • Step 1: Load packages and data
      • Step 1.1: Load required R packages
      • Step 1.2: Import data
    • Step 2: Configure reporting indicators and function
      • Step 2.1: Define reporting indicators
      • Step 2.2: Reporting pattern identification function
    • Step 3: Method 1 - Permanent activity status identification
      • Step 3.1: Permanent activity status identification
      • Step 3.2: Define activity status visualization function
      • Step 3.4: Permanent activity visualization
    • Step 4: Method 2 - First-to-last activity status identification
      • Step 4.1: First-to-last activity status identification
      • Step 4.2: First-to-last activity status visualization
    • Step 5: Method 3 - Dynamic activity status identification
      • Step 5.1: Dynamic activity status identification
      • Step 5.2: Dynamic activity status visualization
      • Step 5.3: Visualize dynamic activation flips
    • Step 6: Activity status method comparison
      • Step 6.2: Visualize method comparison
    • Step 7: Export results
  • Full code
  1. 2. Data Assembly and Management
  2. 2.3 Routine Surveillance Data
  3. Determining active and inactive status

Determining active and inactive status

Overview

In the SNT workflow, reporting rate calculations, which are essential to the estimation of other key indicators such as incidence, depend on the activity status of each health facility.

Objectives
  • Classify health facility activity status to define reporting rate denominator
  • Visualize the status of malaria reporting in the country

Key concepts in defining active facilities

To be able to proceed with reporting rates calculations, we first need to determine whether each health facility was active in a given month, that is, whether it was expected to report.

The method used to define facility activity status should be discussed with the SNT team, who will guide whether the country has an established or preferred method. In some cases, the NMP may already rely on a Health Facility Master List to identify active facilities. While this can be a useful starting point, it may not always reflect real-time service delivery or facility functionality, and its reliability should be carefully assessed.

If no trusted method exists, or if additional validation is needed, an alternative data-driven approach can be used. This approach infers activity status directly from routine surveillance data, based on whether a facility reported any valid values for key malaria indicators.

Monthly Activity Classification

For each health facility (HF) on a given month:

  • If the HF submitted valid (non-NA) data for any key indicator → it is classified as active reporting
  • If the HF did not report on any key indicators:
    • If it has reported in any prior month → active not reporting
    • If it has never reported → inactive

This data-driven approach offers a flexible alternative when no reliable master list exists or when further validation is required. It uses observed reporting patterns to classify activity status, based on whether a facility submitted valid data for selected malaria indicators.

These key indicators, such as allout, test, susp, pres, conf, and treat (for example), reflect core functions of malaria service delivery, including suspected case reporting, diagnostic testing, and treatment. If a facility reports on any of these indicators in a given month, it can reasonably be considered operational and engaged in the malaria surveillance system.

VT: Adding this section here as discussed with the team - however I see the steps in the code below seem to correspond to a different approach, i.e. using the health facility master list to determine HF acitvity status. What I am adding here is the alternative method we have been using in SLE, which assigns activity status based on reporting of certain key indicators. Having discussed with Bea in the SLE SNT call, sounds like the two approaches might need to be combined in most cases - just posting this note for clarity

Best Practice for Active Status Classification

Ideally, analysts should receive a copy of the Master Facility List (MFL) which includes columns for active/inactive status of health facilities. This is typically the most accurate and up-to-date classification of facility active/inactive status. If provided, this information should be used to generate active status visualizations and reporting rate analysis. Review the Merging shapefiles with tabular data page to merge your MFL with DHIS2 data and proceed with the visualization steps on this page.

Consult the SNT Team

In the absence of health facility active status information in the MFL, active/inactive status may be determined through one of the three methods below based on what is designated as a key indicator.

The selection of key indicators (and the method used to define facility activity) should be discussed and validated with the SNT team. In some countries, a Health Facility Master List may be appropriate; in others, indicator-based definitions may be more reliable. The final approach should reflect how malaria services are delivered and reported within the national system.

Indicator-specific activity status

In most countries, a separate monthly activity status may be needed when calculating reporting rates for IPD or OPD-specific indicators. For example, inpatient indicators should only include facilities with inpatient capacity. The criteria for inclusion should be discussed with the program. While facility type (e.g. hospital or health center with wards) can help, it may not always be definitive.

Methods for determining active and inactive status of health facilities from reporting status

A health facility is considered “active” for a given month based on three different methods, each with distinct criteria to classify facilities as active or inactive. Below are the three methods:

Method 1: Permanent activation

Criteria: A facility is classified as active from its first reporting month onwards, and inactive before its first report.

Key principle: A facility is only included in the denominator (expected to report) starting from the month it first actually reported any malaria data. Before that first reporting month, the facility is considered “inactive” and not expected to report.

Rationale: This method recognizes that facilities may not exist, be operational, have DHIS2 access, or be participating in malaria surveillance from the beginning of the analysis period. It avoids underestimating reporting performance by only evaluating facilities during periods after which they have demonstrated the capacity to report.

Illustration:

Active vs Inactive – Method 1
  adm1 adm2 adm3 hf_uid date allout susp test conf maltreat report status
0 msk1 msk2 msk3 hf_0001 2024-01 nan nan nan nan nan No Inactive
1 msk1 msk2 msk3 hf_0001 2024-02 nan nan nan nan nan No Inactive
2 msk1 msk2 msk3 hf_0001 2024-03 20 15 10 5 5 Yes Active reporting
3 msk1 msk2 msk3 hf_0001 2024-04 30 15 10 8 5 Yes Active reporting
4 msk1 msk2 msk3 hf_0001 2024-05 60 15 10 5 nan Yes Active reporting
5 msk1 msk2 msk3 hf_0001 2024-06 nan nan nan nan nan No Active not reporting
6 msk1 msk2 msk3 hf_0001 2024-07 nan nan nan nan nan No Active not reporting
7 msk1 msk2 msk3 hf_0001 2024-08 nan nan nan nan nan No Active not reporting
8 msk1 msk2 msk3 hf_0001 2024-09 5 5 5 5 5 Yes Active reporting
9 msk1 msk2 msk3 hf_0001 2024-10 nan nan nan nan nan No Active not reporting
10 msk1 msk2 msk3 hf_0001 2024-11 nan nan nan nan nan No Active not reporting
11 msk1 msk2 msk3 hf_0001 2024-12 nan nan nan nan nan No Active not reporting

Method 2: Activate after first report, inactivate after last report

Criteria: A facility is classified as active once it starts reporting, and inactive after its last report. To avoid mis-attributing non-reporting as inactivity in the most recent months of the dataset, we can also require a minimum number of non-reports (for example, 6 months) after the facility’s last report.

Key principle: A facility is included in the denominator (expected to report) for a given month if it has ever reported, and excluded after it has stopped reporting.

Rationale: This method recognizes that facilities may shut down permanently, for example due to decreased local population, insecurity, or diminished resources for service provision. It avoids underestimating reporting performance by only evaluating facilities during periods which they have demonstrated the capacity to report.

Illustration:

Active vs Inactive – Method 2
  adm1 adm2 adm3 hf_uid date allout susp test conf maltreat report status
0 msk1 msk2 msk3 hf_0001 2024-01 nan nan nan nan nan No Inactive
1 msk1 msk2 msk3 hf_0001 2024-02 nan nan nan nan nan No Inactive
2 msk1 msk2 msk3 hf_0001 2024-03 20 15 10 5 5 Yes Active reporting
3 msk1 msk2 msk3 hf_0001 2024-04 30 15 10 8 5 Yes Active reporting
4 msk1 msk2 msk3 hf_0001 2024-05 60 15 10 5 nan Yes Active reporting
5 msk1 msk2 msk3 hf_0001 2024-06 nan nan nan nan nan No Active not reporting
6 msk1 msk2 msk3 hf_0001 2024-07 nan nan nan nan nan No Active not reporting
7 msk1 msk2 msk3 hf_0001 2024-08 nan nan nan nan nan No Active not reporting
8 msk1 msk2 msk3 hf_0001 2024-09 5 5 5 5 5 Yes Active reporting
9 msk1 msk2 msk3 hf_0001 2024-10 nan nan nan nan nan No Inactive
10 msk1 msk2 msk3 hf_0001 2024-11 nan nan nan nan nan No Inactive
11 msk1 msk2 msk3 hf_0001 2024-12 nan nan nan nan nan No Inactive

Method 3: Dynamic activation and inactivation

Criteria: A facility is classified as active once it starts reporting, and inactive during continuous months of non-reporting, for a specified minimum number of continuous months of non-reporting.

Key principle: A facility is excluded from the denominator (expected to report) whenever there is a continuous window of N months of non-reporting (for example, 6 months). The window size (N) can be configured based on program requirements.

Rationale: This method recognizes that facilities may have temporary interruptions in functionality due to various operational factors such as staff shortages, equipment issues, inaccessibility from natural disasters or insecurity. The facility may regain activity in the future as those factors change, then become inactive if those factors reappear. It provides a dynamic assessment that balances operational reality with accountability, allowing facilities to maintain “active” status even with occasional reporting gaps as long as they demonstrate recent engagement. However, it is not normal for a facility to be frequently changing between active and inactive status, and if you are seeing this when using Method 3, you should consider lengthening your window size or switching to Method 2.

Illustration

Active vs Inactive – Method 3
  adm1 adm2 adm3 hf_uid date allout susp test conf maltreat report status
0 msk1 msk2 msk3 hf_0001 2024-01 20 15 5 5 5 Yes Active reporting
1 msk1 msk2 msk3 hf_0001 2024-02 nan nan nan nan nan No Inactive
2 msk1 msk2 msk3 hf_0001 2024-03 nan nan nan nan nan No Inactive
3 msk1 msk2 msk3 hf_0001 2024-04 nan nan nan nan nan No Inactive
4 msk1 msk2 msk3 hf_0001 2024-05 nan nan nan nan nan No Inactive
5 msk1 msk2 msk3 hf_0001 2024-06 nan nan nan nan nan No Inactive
6 msk1 msk2 msk3 hf_0001 2024-07 nan nan nan nan nan No Inactive
7 msk1 msk2 msk3 hf_0001 2024-08 nan nan nan nan nan No Inactive
8 msk1 msk2 msk3 hf_0001 2024-09 5 5 5 5 5 Yes Active reporting
9 msk1 msk2 msk3 hf_0001 2024-10 nan nan nan nan nan No Active not reporting
10 msk1 msk2 msk3 hf_0001 2024-11 nan nan nan nan nan No Active not reporting
11 msk1 msk2 msk3 hf_0001 2024-12 nan nan nan nan nan No Active not reporting

Method Summary

Comparison Aspect Method 1: Permanent Activation Method 2: Activate/Inactivate with Last Report Method 3: Dynamic Activation
Activation Criteria First report received First report received First report received
Inactivation Criteria Never (once active, always active) After last report + grace period (e.g., 6 months) After N consecutive months of non-reporting (e.g., 6 months)
Facility Status Binary: inactive → permanent active Binary: inactive → active → permanent inactive Dynamic: can toggle between active/inactive multiple times
Handles Temporary Closures ❌ No ❌ No ✅ Yes
Handles Permanent Closures ❌ No ✅ Yes ✅ Yes
Data Requirements Minimal historical data Complete historical data preferred Complete time series data
Best Use When Analyzing new facilities or early program phases Studying facility attrition/permanent closures Monitoring ongoing operations with temporary disruptions
Advantages Simple to implement; stable denominators Accounts for permanent exits; avoids penalizing for closed facilities Realistic for operational contexts; accommodates temporary issues
Limitations Overestimates active facilities over time May misclassify temporarily closed facilities as permanently closed More complex; status can fluctuate; requires parameter tuning

Step-by-step

Let’s identify active facilities - we move into the step-by-step process for implementing this in code using example DHIS2 data from Sierra Leone. We assume you are working with cleaned and preprocessed routine surveillance data.

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Load packages and data

Step 1.1: Load required R packages

Load all necessary packages for data processing and visualization to determine health facility active status.

  • R
  • Python
# Install or load relevant packages
pacman::p_load(
  readxl,        # Read Excel files
  dplyr,         # Data manipulation
  tidyr,         # Data tidying
  lubridate,     # Date handling
  ggplot2,       # Data visualization
  RColorBrewer,  # Color palettes
  scales,        # Scale functions for ggplot2
  purrr,         # Functional programming
  DT,            # Interactive data tables
  writexl,       # Export to Excel
  reticulate,    # R-Python interoperability
  devtools       # Package management
)

# Install/update and load sntutils
if (!requireNamespace("sntutils", quietly = TRUE)) {
  devtools::install_github("ahadi-analytics/sntutils", quiet = TRUE, upgrade = "always")
} else {
  devtools::install_github("ahadi-analytics/sntutils", quiet = TRUE, upgrade = "always")
}

library(sntutils)

To adapt the code:

  • Line 3: Change directory paths to match the folder structure
import pandas as pd
from pyhere import here
import numpy as np
from matplotlib.colors import ListedColormap
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import seaborn as sns

To adapt the code:

  • Do not modify anything in the code above

Step 1.2: Import data

Load the preprocessed malaria routine data. This page continues the use of the preprocessed Sierra Leone DHIS2 data, obtained through following the steps on the DHIS2 preprocessing page.

  • R
  • Python
# Define file path using here package for reproducible paths
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")

# Load the preprocessed DHIS2 malaria surveillance data
df <- readRDS(data_filepath)

To adapt the code:

  • Line 3: Change directory paths to match the folder structure
dhis2_df = pd.read_parquet(here('english/data_r/routine_cases', 'dhis2_processed_data_python.parquet'))

Step 2: Configure reporting indicators and function

Step 2.1: Define reporting indicators

In this step we define the main reporting indicators for activity status. We also modify the format of the date column to store as proper Date objects rather than character strings.

  • R
  • Python
report_cols <- c("allout", "test", "pres", "conf", "maltreat", "maladm")

# Keep original date format as a separate column
df$date_original <- df$date

# Convert "YYYY-MM" to proper Date objects using base R as.Date
df$date <- as.Date(paste0(df$date, "-01"))

To adapt the code:

  • Do not modify anything in the code above
key_indicators = ['allout', 'test', 'pres', 'conf', 'maltreat', 'maladm']

To adapt the code:

  • Do not modify anything in the code above

Step 2.2: Reporting pattern identification function

We begin by identifying each health facilitiy’s first reporting date to implement classification method 1 (permanent activation).

  • R
  • Python
Show the code
# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)

# Add Year and YM columns
df <- df |>
  dplyr::mutate(
    Year = lubridate::year(date),
    Month = lubridate::month(date),
    YM = format(date, "%Y-%m")
  )

# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)

# Identify the first health facility reporting date using YM
first_reports <- df |>
  dplyr::filter(reported == 1) |>
  dplyr::group_by(hf_uid) |>
  dplyr::summarise(first_month_reported_YM = min(YM), .groups = "drop")

df <- df |>
  dplyr::left_join(first_reports, by = "hf_uid")

# Status classification (0, 0.5, 1)
df <- df |>
  dplyr::mutate(
    Facility_status = dplyr::case_when(
      reported == 1 ~ 1,
      reported == 0 & YM >= first_month_reported_YM ~ 0.5,
      TRUE ~ 0
    ),
    Facility_active = Facility_status > 0
  )

To adapt the code:

  • Do not modify anything in the code above
# make a copy of the data
dfr = dhis2_df.copy()

# add a column indicating whether the HF reported on any of the key indicators
dfr.insert(len(dfr.columns), 'key_variables', dfr[key_indicators].notna().any(axis = 1))
dfr.insert(len(dfr.columns), 'reported', np.where(dfr['key_variables'], 1, 0))

# drop unecessary columns = when consulted with team, Val to add normalised adm names functions and dftree to streamline these operations
cols = ['Year', 'Month', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid', 'key_variables', 'reported']
dfr = dfr[cols]

# compute first month reported for each HF and add column in dfr
t = dfr[dfr['reported'] == 1].groupby('hf_uid')['YM'].min().to_frame(name = 'first_month_reported').reset_index()

# make sure to keep all HFs in case some don't have a valid first month (never reported on anything)
temp = pd.DataFrame(dfr['hf_uid'].unique(), columns = ['hf_uid'])
t = temp.merge(t, on = 'hf_uid', how = 'left', validate = '1:1')
dfr = dfr.merge(t, on = 'hf_uid', how = 'left', validate = 'm:1')

# add HF status column:
# 0: not active
# 0.5: HF didn't report when considered active
# 1: active and reporting
dfr.insert(len(dfr.columns),
          'Facility_status',
          np.where(dfr['reported'] == 1, 1, np.where((dfr['reported'] == 0) & (dfr['YM'] >= dfr['first_month_reported']), 0.5, 0)))

# add active HF column
dfr.insert(len(dfr.columns), 'Facility_active', np.where(dfr['Facility_status'] == 0, False, True))

# quick visual check
dfr.head(10).style
  Year Month YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid hf hf_uid key_variables reported first_month_reported Facility_status Facility_active
0 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Aethel CHP HF_00001 False 0 2019-01 0.000000 False
1 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Agape Way CHP HF_00002 True 1 2015-01 1.000000 True
2 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Anglican Diocese Clinic HF_00003 False 0 nan 0.000000 False
3 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Batiama Layout MCHP HF_00004 False 0 2015-05 0.000000 False
4 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo Government Hospital HF_00005 True 1 2015-01 1.000000 True
5 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo School Bay CHP HF_00006 False 0 2022-01 0.000000 False
6 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Breakthrough MCHP HF_00007 False 0 2023-10 0.000000 False
7 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Brima Town CHP HF_00008 True 1 2015-01 1.000000 True
8 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 EDC Unit CHP HF_00009 True 1 2015-01 1.000000 True
9 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Favour MCHP HF_00010 True 1 2015-01 1.000000 True

To adapt the code:

  • Do not modify anything in the code above

Step 3: Method 1 - Permanent activity status identification

Step 3.1: Permanent activity status identification

Building off the previous step, this code classifies facilities as active if they reported or have reported before, otherwise inactive.

  • R
  • Python
df <- df |>
  dplyr::mutate(
    active_status_method1 = dplyr::case_when(
      Facility_status == 1 ~ "Active",
      Facility_status == 0.5 ~ "Active",
      Facility_status == 0 ~ "Inactive",
      TRUE ~ "Inactive"
    )
  )

cat("Method 1 (R) - Summary:\n")
cat("Total facilities:", length(unique(df$hf_uid)), "\n")
cat("Active facilities (ever reported):", length(unique(df$hf_uid[!is.na(df$first_month_reported_YM)])), "\n")
cat("Never reported facilities:", length(unique(df$hf_uid[is.na(df$first_month_reported_YM)])), "\n")
Output
Method 1 (R) - Summary:
Total facilities: 1771 
Active facilities (ever reported): 1534 
Never reported facilities: 237 
Alternative code option using sntutils package
df_method1 <- sntutils::classify_facility_activity(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  key_indicators = report_cols,
  method = 1,
  binary_classification = TRUE,
  reporting_rule = "any_non_na"
)
# Method 1 is already implemented above as Facility_active
# This represents permanent activation after first report

print("Method 1 (Python) - Permanent Activation")
Method 1 (Python) - Permanent Activation
print(f"Total facilities: {len(dfr['hf_uid'].unique())}")
Total facilities: 1324
print(f"Active facilities (ever reported): {len(dfr[dfr['first_month_reported'].notna()]['hf_uid'].unique())}")
Active facilities (ever reported): 1127
print(f"Never reported facilities: {len(dfr[dfr['first_month_reported'].isna()]['hf_uid'].unique())}")
Never reported facilities: 197

To adapt the code:

  • Do not modify anything in the code above

Step 3.2: Define activity status visualization function

To simplify plotting each active status method, we define a function that generates corresponding visualizations based on defined input parameters.

  • R
  • Python
Show the code
plot_facility_activity <- function(
    data,
    method = c("method1", "method2", "method3"),
    level = c("national", "district"),
    facet_col = NULL,
    title = NULL,
    subtitle = NULL,
    plot_flips = FALSE
) {

  # Map method to column name
  status_col <- switch(method,
    "method1" = "active_status_method1",
    "method2" = "active_status_method2",
    "method3" = "active_status_method3",
    stop("Method must be 'method1', 'method2', or 'method3'")
  )

  # Method labels for titles
  method_labels <- c(
    "method1" = "Permanent Activation",
    "method2" = "First-to-Last Report",
    "method3" = "Dynamic Activation"
  )

  # Handle status flips for Method 3
  if (plot_flips && method == "method3") {
    # Identify facilities with status changes
    flip_facilities <- data |>
      arrange(hf_uid, date) |>
      group_by(hf_uid) |>
      summarise(has_flip = length(unique(.data[[status_col]])) > 1) |>
      filter(has_flip) |>
      pull(hf_uid)

    data <- data |>
      filter(hf_uid %in% flip_facilities)

    flip_count <- length(flip_facilities)
    subtitle <- paste("Showing", flip_count, "facilities with status flips")
  }

  # Set default titles if not provided
  if (is.null(title)) {
    title <- paste("Method", gsub("method", "", method), ":", method_labels[method])
  }

  if (is.null(subtitle) && !plot_flips) {
    subtitle <- switch(method,
      "method1" = "Facilities remain active indefinitely after first report",
      "method2" = "Facilities are active between first and last report",
      "method3" = "Handles temporary closures (6-month non-reporting threshold)"
    )
  }

  # Create base plot with consistent colors
  p <- ggplot(data, aes(x = date, y = reorder(hf_uid, total_reports), fill = .data[[status_col]])) +
    geom_tile() +
    scale_fill_manual(values = c("Active" = "pink", "Inactive" = "#47B5FF"), name = "Activity Status") +
    scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
    theme_minimal() +
    theme(
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank(),
      axis.text.x = element_text(angle = 45, hjust = 1),
      legend.position = "bottom",
      plot.title = element_text(face = "bold", size = 14),
      plot.subtitle = element_text(size = 11, color = "gray40")
    ) +
    labs(
      x = "Date",
      y = "Health Facilities",
      title = title,
      subtitle = subtitle
    )

  # ADD FLIP MARKERS only for Method 3 flips - EXCLUDING ACTIVATION
  if (plot_flips && method == "method3") {
    # Find exact flip points but exclude the first activation (inactive → active)
    flip_points <- data |>
      arrange(hf_uid, date) |>
      group_by(hf_uid) |>
      mutate(
        status_change = .data[[status_col]] != lag(.data[[status_col]]),
        # Identify first activation to exclude it
        first_activation = min(which(.data[[status_col]] == "Active")),
        flip_point = ifelse(status_change & row_number() > first_activation, as.character(date), NA)
      ) |>
      filter(!is.na(flip_point)) |>
      ungroup()

    # Add points at flip locations only if there are any flips
    if (nrow(flip_points) > 0) {
      p <- p +
        geom_point(data = flip_points,
                   aes(x = date, y = hf_uid),
                   color = "black", size = 1, shape = 21, fill = "white", stroke = 1)
    }
  }

  # Add faceting for district level
  if (level == "district" || !is.null(facet_col)) {
    if (is.null(facet_col)) {
      facet_col <- "adm1"
    }
    p <- p +
      facet_wrap(as.formula(paste("~", facet_col)), scales = "free_y", ncol = 4) +
      theme(
        axis.text.x = element_text(angle = 90, hjust = 1, size = 6),
        strip.text = element_text(size = 8)
      )
  }

  return(p)
}

To adapt the code:

  • Do not modify anything in the code above

To adapt the code:

  • Do not modify anything in the code above

Step 3.4: Permanent activity visualization

The active status visualization function defined in the previous step can then be applied to method 1 results.

  • R
  • Python
plot_facility_activity(df, method = "method1", level = "national")

plot_facility_activity(df, method = "method1", level = "district")
Output

Alternative code option using sntutils package
sntutils::facility_reporting_plot(
  data = dhis2_hf,
  hf_col = "hf_uid",
  date_col = "date",
  palette = "violet",
  key_indicators = vars_of_interest,
  facet_col = "adm2",       # for the facetting
  facet_ncol = 7,           # the number of cols for the facetting
  include_never_reported = TRUE,
  target_language = "fr",
  method = 1,
  year_breaks = 8,
  plot_path = val_plot_path,
  plot_width = 12,
  plot_height = 14,
  plot_scale = 0.6
)

To adapt the code:

  • Do not modify anything in the code above

To adapt the code:

  • Do not modify anything in the code above
Method 1 (permanent activation) implementation complete!

The classification of health facility active status *based only on their first reporting date** (i.e. permanent activation) is now complete.

The steps below build upon this to implement method 2 and 3 active status.

Step 4: Method 2 - First-to-last activity status identification

Step 4.1: First-to-last activity status identification

To begin method 2 classification, we identify each health facility’s last reporting date. This is used in tandem with the previously identified first reporting date (method 1) to determine active status using method 2.

  • R
  • Python
Show the code
# Method 2: Identify last reports and create active period
last_reports <- df |>
  dplyr::filter(reported == 1) |>
  dplyr::group_by(hf_uid) |>
  dplyr::summarise(last_month_reported_YM = max(YM), .groups = "drop")

df <- df |>
  dplyr::left_join(last_reports, by = "hf_uid")

# Method 2: Active only between first and last report
df <- df |>
  dplyr::mutate(
    Facility_status_method2 = dplyr::case_when(
      is.na(first_month_reported_YM) ~ 0,  # Never reported
      YM >= first_month_reported_YM & YM <= last_month_reported_YM & reported == 1 ~ 1,  # Active and reporting
      YM >= first_month_reported_YM & YM <= last_month_reported_YM & reported == 0 ~ 0.5,  # Active but not reporting
      TRUE ~ 0  # Outside active period
    ),
    Facility_active_method2 = Facility_status_method2 > 0,
    active_status_method2 = dplyr::if_else(Facility_active_method2, "Active", "Inactive")
  )

# More informative summary
total_facilities <- length(unique(df$hf_uid))
facilities_with_activity_period <- length(unique(df$hf_uid[!is.na(df$last_month_reported_YM)]))
never_reported <- length(unique(df$hf_uid[is.na(df$first_month_reported_YM)]))
currently_active <- df |>
  dplyr::filter(YM == max(YM)) |>
  dplyr::summarise(active_count = sum(active_status_method2 == "Active")) |>
  dplyr::pull(active_count)

cat("Method 2 (R) - First-to-Last Report Activation\n")
cat("Facilities with defined activity period:", facilities_with_activity_period, "\n")
cat("Never reported facilities:", never_reported, "\n")
cat("Currently active facilities:", currently_active, "\n")
cat("Facilities permanently closed:", facilities_with_activity_period - currently_active, "\n")
Output
Method 2 (R) - First-to-Last Report Activation
Facilities with defined activity period: 1534 
Never reported facilities: 237 
Currently active facilities: 1434 
Facilities permanently closed: 100 
Alternative code option using sntutils package
df_method2 <- sntutils::classify_facility_activity(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  key_indicators = report_cols,
  method = 2,
  binary_classification = TRUE,
  reporting_rule = "any_non_na"
)

To adapt the code:

  • Do not modify anything in the code above

To adapt the code:

  • Do not modify anything in the code above

Step 4.2: First-to-last activity status visualization

We can call the active status visualization function again here to visualize method 2 facility classification.

  • R
  • Python
Show the code
plot_facility_activity(df, method = "method2", level = "national")

plot_facility_activity(df, method = "method2", level = "district")
Output

Alternative code option using sntutils package
sntutils::facility_reporting_plot(
  data = dhis2_hf,
  hf_col = "hf_uid",
  date_col = "date",
  palette = "violet",
  key_indicators = vars_of_interest,
  facet_col = "adm2",       # for the facetting
  facet_ncol = 7,           # the number of cols for the facetting
  include_never_reported = TRUE,
  target_language = "fr",
  method = 2,
  year_breaks = 8,
  plot_path = val_plot_path,
  plot_width = 12,
  plot_height = 14,
  plot_scale = 0.6
)

To adapt the code:

  • Do not modify anything in the code above

To adapt the code:

  • Do not modify anything in the code above
Method 2 (first-to-last activation) implementation complete!

The classification of health facility active status *based on their first and last reporting date** (i.e. first-to-last activation) is now complete.

The steps below build upon this to implement method 3 active status.

Step 5: Method 3 - Dynamic activity status identification

Step 5.1: Dynamic activity status identification

The below determines active status based on 6+ consecutive months of non-reporting between the first and last reporting dates identified previously.

  • R
  • Python
Show the code
# Method 3: Calculate consecutive non-reporting months
df <- df |>
  dplyr::arrange(hf_uid, YM) |>
  dplyr::group_by(hf_uid) |>
  dplyr::mutate(
    # Calculate consecutive non-reporting counter
    consecutive_non_report = {
      counter <- 0
      purrr::map_dbl(reported, ~{
        if (.x == 1) {
          counter <<- 0
        } else {
          counter <<- counter + 1
        }
        counter
      })
    }
  ) |>
  dplyr::ungroup()

# Method 3: Inactive after 6+ consecutive months of non-reporting BETWEEN first and last reporting dates
df <- df |>
  dplyr::mutate(
    Facility_status_method3 = dplyr::case_when(
      is.na(first_month_reported_YM) ~ 0,  # Never reported
      YM < first_month_reported_YM ~ 0,  # Before first report
      consecutive_non_report >= 6 & YM <= last_month_reported_YM ~ 0,  # 6+ months non-reporting WITHIN active period
      reported == 1 ~ 1,  # Active and reporting
      TRUE ~ 0.5  # Active but not reporting
    ),
    Facility_active_method3 = Facility_status_method3 > 0,
    active_status_method3 = dplyr::if_else(Facility_active_method3, "Active", "Inactive")
  )

# Count facilities that change status
status_flip_facilities <- df |>
  dplyr::group_by(hf_uid) |>
  dplyr::summarise(
    has_status_change = length(unique(active_status_method3)) > 1,
    .groups = "drop"
  ) |>
  dplyr::filter(has_status_change)

cat("Method 3 (R) - Summary:\n")
cat("Facilities that experienced 6+ months non-reporting:", length(unique(df$hf_uid[df$consecutive_non_report >= 6])), "\n")
cat("Facilities with status changes:", nrow(status_flip_facilities), "\n")
Output
Method 3 (R) - Summary:
Facilities that experienced 6+ months non-reporting: 541 
Facilities with status changes: 334 

To adapt the code:

  • Do not modify anything in the code above
Alternative code option using sntutils package
df_method3 <- sntutils::classify_facility_activity(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  key_indicators = report_cols,
  method = 3,
  binary_classification = TRUE,
  reporting_rule = "any_non_na"
)

To adapt the code:

  • Do not modify anything in the code above

Step 5.2: Dynamic activity status visualization

Here we call the active status plotting function again to visualize method 3 results at both the national and district level.

  • R
  • Python
plot_facility_activity(df, method = "method3", level = "national")

plot_facility_activity(df, method = "method3", level = "district")
Output

Alternative code option using sntutils package
sntutils::facility_reporting_plot(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  palette = "violet",
  key_indicators = vars_of_interest,
  facet_col = "adm2",       # for the facetting
  facet_ncol = 7,           # the number of cols for the facetting
  include_never_reported = TRUE,
  target_language = "fr",
  method = 3,
  nonreport_window = 6, # Needed for method 3
  year_breaks = 8,
  plot_path = val_plot_path,
  plot_width = 12,
  plot_height = 14,
  plot_scale = 0.6
)

To adapt the code:

  • Do not modify anything in the code above

To adapt the code:

  • Do not modify anything in the code above

Step 5.3: Visualize dynamic activation flips

An additional visualization relevant to method 3 is the number of “flips” in status–that is, the number of times a facility switches from active, to inactive, to active again, etc. The defined plotting function can visualize flips too.

  • R
  • Python
# Show only facilities that flip status in Method 3
plot_facility_activity(
  df,
  method = "method3",
  level = "national",
  plot_flips = TRUE,
  title = "Method 3: Facilities with Dynamic Status Flips"
)
Output

To adapt the code:

  • Do not modify anything in the code above
Method 3 (dynamic activation) implementation complete!

The classification of health facility active status *based on their first and last reporting date as well as extended non-reporting periods** (i.e. dynamic activation) is now complete.

Step 6: Activity status method comparison

All three active status methods have now been applied. Visualizations allow us to compare these methods to better understand the nature of health facilities in the dataset and decide which method should be selected for further use.

Step 6.2: Visualize method comparison

  • R
  • Python
sntutils::compare_methods_plot(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  key_indicators = report_cols,
  language = "en"
)
Output

To adapt the code:

  • Do not modify anything in the code above

To adapt the code:

  • Do not modify anything in the code above

Step 7: Export results

Finally, we export results of active status in addition to df_expected which contains the expected reports of health facilities needed for reporting rate calculations.

  • R
  • Python
Show the code
# Create dftree without UIDs
cols <- c('adm0', 'adm1', 'adm2', 'adm3', 'hf', 'hf_uid')
dftree <- df |>
  dplyr::select(all_of(cols)) |>
  dplyr::distinct() |>
  dplyr::arrange(across(all_of(cols)))

# Add Year and YM columns to main data
df_with_ym <- df |>
  dplyr::mutate(
    Year = lubridate::year(date),
    Month = lubridate::month(date),
    YM = format(date, "%Y-%m")
  )

# Method 1 ONLY - Create monthly denominator for number of HFs active in each adm3
df_expected_method1 <- df_with_ym |>
  dplyr::group_by(Year, YM, adm3) |>
  dplyr::summarise(
    denominator = sum(active_status_method1 == "Active", na.rm = TRUE),
    .groups = "drop"
  )

# Add parent admin units
admin_cols <- c('adm0', 'adm1', 'adm2', 'adm3')
t <- dftree[admin_cols] |> dplyr::distinct()

df_expected_method1 <- df_expected_method1 |>
  dplyr::left_join(t, by = "adm3")

# Reorder columns
final_cols <- c('Year', 'YM', 'adm0', 'adm1', 'adm2', 'adm3', 'denominator')
df_expected_method1 <- df_expected_method1[final_cols] |>
  dplyr::arrange(across(all_of(final_cols)))

# Save results - ONLY Method 1 for now
# write.csv(df_expected_method1, "expected_reports_method1.csv", row.names = FALSE)
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid']
dftree= dhis2_df[cols].drop_duplicates().reset_index(drop = True)

# create monthly denominator for number of HFs active in each adm2
df_expected = (dfr
     .groupby(['Year', 'YM', 'adm3_uid'])[['Facility_active']].sum(min_count = 1)
     .reset_index()
     .rename(columns = {'Facility_active': 'denominator'}))

# add parent admin units
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid']
t = dftree[cols].drop_duplicates().reset_index(drop = True)
df_expected = df_expected.merge(t, on = 'adm3_uid', how = 'left', validate = 'm:1')

# reorder columns
cols = ['Year', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'denominator']
df_expected = df_expected[cols].sort_values(by = cols).reset_index(drop = True)

# save
# df_expected.to_csv(here('english/data_r/routine_cases', 'df_expected.csv'), index = None)

# Inspect results
df_expected.head(10).style
  Year YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid denominator
0 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 21
1 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Badjia Chiefdom adm3_00002 2
2 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bagbwe Chiefdom adm3_00003 6
3 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Baoma Chiefdom adm3_00004 16
4 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bargbo Chiefdom adm3_00005 8
5 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bongor Chiefdom adm3_00006 4
6 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bumpe Ngao Chiefdom adm3_00007 13
7 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Gbo Chiefdom adm3_00008 2
8 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Jaiama Chiefdom adm3_00009 3
9 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Kakua Chiefdom adm3_00010 8

Full code

  • Method 1: Permanent activation
  • Method 2: Activate after first report, inactivate after last report
  • Method 3: Dynamic activation and inactivation
  • Python Val
  • R
Show the code
# Method 1: Permanent Activation - Complete Code
# Load required R packages
pacman::p_load(
  readxl,        # Read Excel files
  dplyr,         # Data manipulation
  tidyr,         # Data tidying
  lubridate,     # Date handling
  ggplot2,       # Data visualization
  RColorBrewer,  # Color palettes
  scales,        # Scale functions for ggplot2
  purrr,         # Functional programming
  DT,            # Interactive data tables
  writexl,
  reticulate     # Export to Excel
)

# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)

# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")

# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)

# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)

# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Method 1: Determine reporting expectations
df <- dplyr::mutate(df, expected_to_report_method1 = base::ifelse(base::is.na(first_month_reported), "Never reported", base::ifelse(date >= first_month_reported, "Expected to report", "Not expected to report")))

# Generate final reporting status for method 1
df <- dplyr::mutate(df, reporting_status_method1 = base::ifelse(expected_to_report_method1 == "Never reported", "Never reported", base::ifelse(expected_to_report_method1 == "Expected to report" & reported == 1, "Expected and reported", base::ifelse(expected_to_report_method1 == "Expected to report" & reported == 0, "Expected but didn't report", "Not expected to report"))))

# Create status codes for method 1
df <- dplyr::mutate(df, status_code_method1 = dplyr::case_when(reporting_status_method1 == "Never reported" ~ 0, reporting_status_method1 == "Expected and reported" ~ 1, reporting_status_method1 == "Expected but didn't report" ~ 2, reporting_status_method1 == "Not expected to report" ~ 3))

# Create active status categories for method 1
df$active_status1 <- dplyr::case_when(
  df$reporting_status_method1 == "Expected and reported" ~ "Active",
  df$reporting_status_method1 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method1 == "Never reported" ~ "Inactive",
  df$reporting_status_method1 == "Not expected to report" ~ "Inactive"
)

# Create numeric codes for active status
df$active_status_code1 <- dplyr::case_when(
  df$active_status1 == "Active" ~ 1,
  df$active_status1 == "Inactive" ~ 0
)

# Save method 1 data to Excel
#writexl::write_xlsx(df, "active_status_method1.xlsx") #

# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)

# Define colors and labels for Method 1
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
                    "1" = "Expected and reported",
                    "2" = "Expected but didn't report",
                    "3" = "Not expected to report")

# Generate Overall Reporting Status Heatmap for Method 1
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
  geom_raster() +
  scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  theme_minimal() +
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
  ) +
  labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 1")

# Generate admin level reporting status heatmap for method 1
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 1 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")

ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 1  - Bo District")

# Generate Overall Active Status Heatmap for Method 1
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")

df$active_status_method1 <- dplyr::case_when(
  df$reporting_status_method1 == "Expected and reported" ~ "Active",
  df$reporting_status_method1 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method1 == "Never reported" ~ "Inactive",
  df$reporting_status_method1 == "Not expected to report" ~ "Inactive"
)

ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method1)) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap")

# Generate admin level active status heatmap for method 1
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method1))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 1 -", area))

    plots_list[[area]] <- p
    base::print(p)
}
  • R
Show the code
# Method 2: Activate after first report, inactivate after last report - Complete Code
# Load required R packages
pacman::p_load(
  readxl,        # Read Excel files
  dplyr,         # Data manipulation
  tidyr,         # Data tidying
  lubridate,     # Date handling
  ggplot2,       # Data visualization
  RColorBrewer,  # Color palettes
  scales,        # Scale functions for ggplot2
  purrr,         # Functional programming
  DT,            # Interactive data tables
  writexl,
  reticulate     # Export to Excel
)

# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)

# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")

# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)

# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)

# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Method 2: Determine reporting expectations
df <- df |>
  dplyr::mutate(
    expected_to_report_method2 = ifelse(
      is.na(first_month_reported),
      "Never reported",
      ifelse(
        date >= first_month_reported & date <= last_month_reported,
        "Expected to report",
        "Not expected to report"
      )
    )
  )

# Determine final reporting status method 2
df <- df |>
  dplyr::mutate(
    reporting_status_method2 = ifelse(
      expected_to_report_method2 == "Never reported",
      "Never reported",
      ifelse(
        expected_to_report_method2 == "Expected to report" & reported == 1,
        "Expected and reported",
        ifelse(
          expected_to_report_method2 == "Expected to report" & reported == 0,
          "Expected but didn't report",
          "Not expected to report"
        )
      )
    )
  )

# Create status codes for method 2
df <- dplyr::mutate(df, status_code_method2 = dplyr::case_when(reporting_status_method2 == "Never reported" ~ 0, reporting_status_method2 == "Expected and reported" ~ 1, reporting_status_method2 == "Expected but didn't report" ~ 2, reporting_status_method2 == "Not expected to report" ~ 3))

# Create active status categories for method 2
df$active_status2 <- dplyr::case_when(
  df$reporting_status_method2 == "Expected and reported" ~ "Active",
  df$reporting_status_method2 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method2 == "Never reported" ~ "Inactive",
  df$reporting_status_method2 == "Not expected to report" ~ "Inactive"
)

# Create numeric codes for active status
df$active_status_code2 <- dplyr::case_when(
  df$active_status2 == "Active" ~ 1,
  df$active_status2 == "Inactive" ~ 0
)

# Save method 2 data to Excel
#writexl::write_xlsx(df, "active_status_method2.xlsx")

# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)

# Define colors and labels for Method 2
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
                    "1" = "Expected and reported",
                    "2" = "Expected but didn't report",
                    "3" = "Not expected to report")

# Generate Overall Reporting Status Heatmap for Method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
  geom_raster() +
  scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  theme_minimal() +
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
  ) +
  labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 2")

# Generate admin level reporting status heatmap for method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 2 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")

ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 2 - Bo District")

# Generate Overall Active Status Heatmap for Method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")

df$active_status_method2 <- dplyr::case_when(
  df$reporting_status_method2 == "Expected and reported" ~ "Active",
  df$reporting_status_method2 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method2 == "Never reported" ~ "Inactive",
  df$reporting_status_method2 == "Not expected to report" ~ "Inactive"
)

ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method2)) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 2")

# Generate admin level active status heatmap for method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method2))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 2 -", area))

    plots_list[[area]] <- p
    base::print(p)
}
  • R
# Method 3: Dynamic activation and inactivation - Complete Code
# Load required R packages
pacman::p_load(
  readxl,        # Read Excel files
  dplyr,         # Data manipulation
  tidyr,         # Data tidying
  lubridate,     # Date handling
  ggplot2,       # Data visualization
  RColorBrewer,  # Color palettes
  scales,        # Scale functions for ggplot2
  purrr,         # Functional programming
  DT,            # Interactive data tables
  writexl,
  reticulate     # Export to Excel
)

# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)

# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")

# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)

# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)

# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Method 3: Determine active and inactive hf
df <- df |>
  dplyr::arrange(hf_uid, date) |>
  dplyr::group_by(hf_uid) |>
  dplyr::mutate(
    # Create a logical vector indicating runs of zeros >= 6
    zero_run = {
      # rle() computes lengths and values of consecutive identical elements
      r <- base::rle(reported == 0)
      # Identify which runs are zeros AND have length >= 6
      run_flag <- r$values & r$lengths >= 6
      # Repeat the TRUE/FALSE flags for all months in the run
      base::rep(run_flag, r$lengths)
    },
    # Assign status based on zero_run
    status_method3 = base::ifelse(zero_run, "Inactive", "Active")
  ) |>
  dplyr::ungroup()

# Determine reporting status method 3
df <- df |>
  dplyr::mutate(
    expected_to_report_method3 = ifelse(
      is.na(first_month_reported),
      "Never reported",
      ifelse(
        status_method3 == "Active",
        "Expected to report",
        "Not expected to report"
      )
    )
  )

# Determine final reporting status method 3
df <- df |>
  dplyr::mutate(
    reporting_status_method3 = ifelse(
      expected_to_report_method3 == "Never reported",
      "Never reported",
      ifelse(
        expected_to_report_method3 == "Expected to report" & reported == 1,
        "Expected and reported",
        ifelse(
          expected_to_report_method3 == "Expected to report" & reported == 0,
          "Expected but didn't report",
          "Not expected to report"
        )
      )
    )
  )

# Create status codes for method 3
df <- dplyr::mutate(df, status_code_method3 = dplyr::case_when(reporting_status_method3 == "Never reported" ~ 0, reporting_status_method3 == "Expected and reported" ~ 1, reporting_status_method3 == "Expected but didn't report" ~ 2, reporting_status_method3 == "Not expected to report" ~ 3))

# Create active status categories for method 3
df$active_status3 <- dplyr::case_when(
  df$reporting_status_method3 == "Expected and reported" ~ "Active",
  df$reporting_status_method3 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method3 == "Never reported" ~ "Inactive",
  df$reporting_status_method3 == "Not expected to report" ~ "Inactive"
)

# Create numeric codes for active status
df$active_status_code3 <- dplyr::case_when(
  df$active_status3 == "Active" ~ 1,
  df$active_status3 == "Inactive" ~ 0
)

# Save method 3 data to Excel
#writexl::write_xlsx(df, "active_status_method3.xlsx")

# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)

# Define colors and labels for Method 3
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
                    "1" = "Expected and reported",
                    "2" = "Expected but didn't report",
                    "3" = "Not expected to report")

# Generate Overall Reporting Status Heatmap for Method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
  geom_raster() +
  scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  theme_minimal() +
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
  ) +
  labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 3")

# Generate admin level reporting status heatmap for method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 3 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")

ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 3 - Bo District")

# Generate Overall Active Status Heatmap for Method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")

df$active_status_method3 <- dplyr::case_when(
  df$reporting_status_method3 == "Expected and reported" ~ "Active",
  df$reporting_status_method3 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method3 == "Never reported" ~ "Inactive",
  df$reporting_status_method3 == "Not expected to report" ~ "Inactive"
)

ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method3)) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 3")

# Generate admin level active status heatmap for method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method3))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 3 -", area))

    plots_list[[area]] <- p
    base::print(p)
}
  • Python
Show the code
import pandas as pd
from pyhere import here
import numpy as np
from matplotlib.colors import ListedColormap
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import seaborn as sns
Output

Step 1.2: Load and prepare data

Now we import the DHIS2 dataset that was initially processed in the DHIS2 Data Preprocessing section of this code library.

  • Python Val
Show the code
dhis2_df = pd.read_parquet(here('english/data_r/routine_cases', 'dhis2_processed_data_python.parquet'))
Output

Step 1.3: Determine reporting status

ADD

Val’s text: Here we create an intermediate dataframe storing the monthly reporting status of each Health Facility.

  • Python Val
  • Summary
Show the code
key_indicators = ['allout', 'test', 'pres', 'conf', 'maltreat', 'maladm']

# make a copy of the data
dfr = dhis2_df.copy()

# add a column indicating whether the HF reported on any of the key indicators
dfr.insert(len(dfr.columns), 'key_variables', dfr[key_indicators].notna().any(axis = 1))
dfr.insert(len(dfr.columns), 'reported', np.where(dfr['key_variables'], 1, 0))

# drop unecessary columns = when consulted with team, Val to add normalised adm names functions and dftree to streamline these operations
cols = ['Year', 'Month', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid', 'key_variables', 'reported']
dfr = dfr[cols]

# compute first month reported for each HF and add column in dfr
t = dfr[dfr['reported'] == 1].groupby('hf_uid')['YM'].min().to_frame(name = 'first_month_reported').reset_index()

# make sure to keep all HFs in case some don't have a valid first month (never reported on anything)
temp = pd.DataFrame(dfr['hf_uid'].unique(), columns = ['hf_uid'])
t = temp.merge(t, on = 'hf_uid', how = 'left', validate = '1:1')
dfr = dfr.merge(t, on = 'hf_uid', how = 'left', validate = 'm:1')

# add HF status column:
# 0: not active
# 0.5: HF didn't report when considered active
# 1: active and reporting
dfr.insert(len(dfr.columns),
          'Facility_status',
          np.where(dfr['reported'] == 1, 1, np.where((dfr['reported'] == 0) & (dfr['YM'] >= dfr['first_month_reported']), 0.5, 0)))

# add active HF column
dfr.insert(len(dfr.columns), 'Facility_active', np.where(dfr['Facility_status'] == 0, False, True))

# quick visual check
dfr.head(10).style
  Year Month YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid hf hf_uid key_variables reported first_month_reported Facility_status Facility_active
0 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Aethel CHP HF_00001 False 0 2019-01 0.000000 False
1 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Agape Way CHP HF_00002 True 1 2015-01 1.000000 True
2 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Anglican Diocese Clinic HF_00003 False 0 nan 0.000000 False
3 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Batiama Layout MCHP HF_00004 False 0 2015-05 0.000000 False
4 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo Government Hospital HF_00005 True 1 2015-01 1.000000 True
5 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo School Bay CHP HF_00006 False 0 2022-01 0.000000 False
6 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Breakthrough MCHP HF_00007 False 0 2023-10 0.000000 False
7 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Brima Town CHP HF_00008 True 1 2015-01 1.000000 True
8 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 EDC Unit CHP HF_00009 True 1 2015-01 1.000000 True
9 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Favour MCHP HF_00010 True 1 2015-01 1.000000 True
Output
  Year Month YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid hf hf_uid key_variables reported first_month_reported Facility_status Facility_active
0 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Aethel CHP HF_00001 False 0 2019-01 0.000000 False
1 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Agape Way CHP HF_00002 True 1 2015-01 1.000000 True
2 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Anglican Diocese Clinic HF_00003 False 0 nan 0.000000 False
3 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Batiama Layout MCHP HF_00004 False 0 2015-05 0.000000 False
4 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo Government Hospital HF_00005 True 1 2015-01 1.000000 True
5 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Bo School Bay CHP HF_00006 False 0 2022-01 0.000000 False
6 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Breakthrough MCHP HF_00007 False 0 2023-10 0.000000 False
7 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Brima Town CHP HF_00008 True 1 2015-01 1.000000 True
8 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 EDC Unit CHP HF_00009 True 1 2015-01 1.000000 True
9 2015 1 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 Favour MCHP HF_00010 True 1 2015-01 1.000000 True

Step 1.3.a Visualize reporting rate

  • Python Val
Show the code
# SS: added to resolve render error



# Plot a visual of monthly HF-level reporting status for the country
df = (dfr.pivot(index = ['hf_uid', 'first_month_reported'], columns = 'YM', values = 'Facility_status')
     .sort_values(by = 'first_month_reported'))

df = df.reset_index().drop(['first_month_reported'], axis = 1).set_index(['hf_uid'])

# Prep colours and labels for cmap and legend
colours = [status_params_dict[i]['colour'] for i in sorted(status_params_dict.keys())]
labels = [status_params_dict[i]['label'] for i in sorted(status_params_dict.keys())]

cmap = ListedColormap(colours)

# Make figure
fs = 15
fig, ax = plt.subplots(figsize = (15, 10))
sns.heatmap(ax = ax, data = df, cmap = cmap, cbar = None)
ax.set_xlabel('')
ax.set_xticks(ax.get_xticks())
ax.set_xticklabels([l.get_text()[0:7] for l in ax.get_xticklabels()], rotation = 45, ha = 'right')
ax.set_yticks([])
[]
Show the code
ax.set_ylabel('HEALTH FACILITY', size = fs)

# Make legend
handles = [mpatches.Patch(color = c, label = l) for c, l in zip(colours, labels)]

ax.legend(handles = handles,
          fontsize = fs,
          ncols = 3,
          bbox_to_anchor = (0.5, -0.1),
          loc = 'upper center')

fig.tight_layout()

# Save
# discuss with team

Output
[]

To adapt the code:

Step 2: Determine active and inactive status

Method 1: First Report Activation

I CAN’T REALLY TELL WHAT IS GOING ON IN THE REMAINING CODE HERE. I REMOVED THE UID AND YYYY-MM SINCE THAT SHOULD BE TAKEN CARE OF IN THE DATA PREPROCESSING! THE DATASET LOADED IN 1.2 SHOULD ALREADY INCLUDE THOSE.

Step 2 Method 2:

ADD

Step 2 Method 3:

ADD

Step 2 Method Val:

Prepare your exptected reports dataframe - df_expected

VT I would suggest dissociating between outpatient and inpatient indicators here, I normally do it. Don’t wan’t to modify the structure too much before discussing with team

Here we build a dataframe storing the number of active Health Facilities for each month of the period studied. This dataframe will be useful in subsequent sections (link to RR and Incidence adjustment sections).

  • Python
Show the code
# create dftree
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid']
dftree= dhis2_df[cols].drop_duplicates().reset_index(drop = True)

# create monthly denominator for number of HFs active in each adm2
df_expected = (dfr
     .groupby(['Year', 'YM', 'adm3_uid'])[['Facility_active']].sum(min_count = 1)
     .reset_index()
     .rename(columns = {'Facility_active': 'denominator'}))

# add parent admin units
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid']
t = dftree[cols].drop_duplicates().reset_index(drop = True)
df_expected = df_expected.merge(t, on = 'adm3_uid', how = 'left', validate = 'm:1')

# reorder columns
cols = ['Year', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'denominator']
df_expected = df_expected[cols].sort_values(by = cols).reset_index(drop = True)

# save
df_expected.to_csv(here('english/data_r/routine_cases', 'df_expected.csv'), index = None)

# Inspect results
df_expected.head(10).style
Output
  Year YM adm0 adm0_uid adm1 adm1_uid adm2 adm2_uid adm3 adm3_uid denominator
0 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo City Council adm2_00001 Bo City adm3_00001 21
1 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Badjia Chiefdom adm3_00002 2
2 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bagbwe Chiefdom adm3_00003 6
3 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Baoma Chiefdom adm3_00004 16
4 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bargbo Chiefdom adm3_00005 8
5 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bongor Chiefdom adm3_00006 4
6 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Bumpe Ngao Chiefdom adm3_00007 13
7 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Gbo Chiefdom adm3_00008 2
8 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Jaiama Chiefdom adm3_00009 3
9 2015 2015-01 Sierra Leone adm0_00001 Bo District adm1_00001 Bo District Council adm2_00002 Kakua Chiefdom adm3_00010 8

To adapt the code:

Step 3: Assign expected and observed reporting status accounting for active/inactive

Step 3.1: Create summary statistics

Step 3.2: Create detailed reporting status

Step 3.3: Assign final status with priority

Step 3.4: Sort and prepare data for visualization

Step 4: Visualise processed data

Step 4.1: Set up data

Step 4.2: Make heatmap

Step 4.3: Make number by time

  • Python Val
Show the code
# quick dfden visual
df = df_expected.groupby('YM')['denominator'].sum(min_count = 1).reset_index()

fig, ax = plt.subplots()
df.plot(ax = ax, x = 'YM', y = 'denominator', label = 'Number of active HFs', color = status_params_dict[1]['colour'])
ax.set_xlabel('')
fig.tight_layout()
Output

Step 5: Save data

ADD

ADD

Full code

Find the full code scripts for determining active and inactive status of health facilities below.

  • R
  • Python
 

©2025 Applied Health Analytics for Delivery and Innovation. All rights reserved