Determining active and inactive status

Overview

In the SNT workflow, reporting rate calculations, which are essential to the estimation of other key indicators such as incidence, depend on the activity status of each health facility.

Objectives

Classify health facility activity status to define reporting rate denominator
Visualize the status of malaria reporting in the country

Key concepts in defining active facilities

To be able to proceed with reporting rates calculations, we first need to determine whether each health facility was active in a given month, that is, whether it was expected to report.

The method used to define facility activity status should be discussed with the SNT team, who will guide whether the country has an established or preferred method. In some cases, the NMP may already rely on a Health Facility Master List to identify active facilities. While this can be a useful starting point, it may not always reflect real-time service delivery or facility functionality, and its reliability should be carefully assessed.

If no trusted method exists, or if additional validation is needed, an alternative data-driven approach can be used. This approach infers activity status directly from routine surveillance data, based on whether a facility reported any valid values for key malaria indicators.

Monthly Activity Classification

For each health facility (HF) on a given month:

If the HF submitted valid (non-NA) data for any key indicator → it is classified as active reporting
If the HF did not report on any key indicators:
- If it has reported in any prior month → active not reporting
- If it has never reported → inactive

This data-driven approach offers a flexible alternative when no reliable master list exists or when further validation is required. It uses observed reporting patterns to classify activity status, based on whether a facility submitted valid data for selected malaria indicators.

These key indicators, such as allout, test, susp, pres, conf, and treat (for example), reflect core functions of malaria service delivery, including suspected case reporting, diagnostic testing, and treatment. If a facility reports on any of these indicators in a given month, it can reasonably be considered operational and engaged in the malaria surveillance system.

VT: Adding this section here as discussed with the team - however I see the steps in the code below seem to correspond to a different approach, i.e. using the health facility master list to determine HF acitvity status. What I am adding here is the alternative method we have been using in SLE, which assigns activity status based on reporting of certain key indicators. Having discussed with Bea in the SLE SNT call, sounds like the two approaches might need to be combined in most cases - just posting this note for clarity

Best Practice for Active Status Classification

Ideally, analysts should receive a copy of the Master Facility List (MFL) which includes columns for active/inactive status of health facilities. This is typically the most accurate and up-to-date classification of facility active/inactive status. If provided, this information should be used to generate active status visualizations and reporting rate analysis. Review the Merging shapefiles with tabular data page to merge your MFL with DHIS2 data and proceed with the visualization steps on this page.

Consult the SNT Team

In the absence of health facility active status information in the MFL, active/inactive status may be determined through one of the three methods below based on what is designated as a key indicator.

The selection of key indicators (and the method used to define facility activity) should be discussed and validated with the SNT team. In some countries, a Health Facility Master List may be appropriate; in others, indicator-based definitions may be more reliable. The final approach should reflect how malaria services are delivered and reported within the national system.

Indicator-specific activity status

In most countries, a separate monthly activity status may be needed when calculating reporting rates for IPD or OPD-specific indicators. For example, inpatient indicators should only include facilities with inpatient capacity. The criteria for inclusion should be discussed with the program. While facility type (e.g. hospital or health center with wards) can help, it may not always be definitive.

Methods for determining active and inactive status of health facilities from reporting status

A health facility is considered “active” for a given month based on three different methods, each with distinct criteria to classify facilities as active or inactive. Below are the three methods:

Method 1: Permanent activation

Criteria: A facility is classified as active from its first reporting month onwards, and inactive before its first report.

Key principle: A facility is only included in the denominator (expected to report) starting from the month it first actually reported any malaria data. Before that first reporting month, the facility is considered “inactive” and not expected to report.

Rationale: This method recognizes that facilities may not exist, be operational, have DHIS2 access, or be participating in malaria surveillance from the beginning of the analysis period. It avoids underestimating reporting performance by only evaluating facilities during periods after which they have demonstrated the capacity to report.

Illustration:

	adm1	adm2	adm3	hf_uid	date	allout	susp	test	conf	maltreat	report	status
0	msk1	msk2	msk3	hf_0001	2024-01	nan	nan	nan	nan	nan	No	Inactive
1	msk1	msk2	msk3	hf_0001	2024-02	nan	nan	nan	nan	nan	No	Inactive
2	msk1	msk2	msk3	hf_0001	2024-03	20	15	10	5	5	Yes	Active reporting
3	msk1	msk2	msk3	hf_0001	2024-04	30	15	10	8	5	Yes	Active reporting
4	msk1	msk2	msk3	hf_0001	2024-05	60	15	10	5	nan	Yes	Active reporting
5	msk1	msk2	msk3	hf_0001	2024-06	nan	nan	nan	nan	nan	No	Active not reporting
6	msk1	msk2	msk3	hf_0001	2024-07	nan	nan	nan	nan	nan	No	Active not reporting
7	msk1	msk2	msk3	hf_0001	2024-08	nan	nan	nan	nan	nan	No	Active not reporting
8	msk1	msk2	msk3	hf_0001	2024-09	5	5	5	5	5	Yes	Active reporting
9	msk1	msk2	msk3	hf_0001	2024-10	nan	nan	nan	nan	nan	No	Active not reporting
10	msk1	msk2	msk3	hf_0001	2024-11	nan	nan	nan	nan	nan	No	Active not reporting
11	msk1	msk2	msk3	hf_0001	2024-12	nan	nan	nan	nan	nan	No	Active not reporting

Method 2: Activate after first report, inactivate after last report

Criteria: A facility is classified as active once it starts reporting, and inactive after its last report. To avoid mis-attributing non-reporting as inactivity in the most recent months of the dataset, we can also require a minimum number of non-reports (for example, 6 months) after the facility’s last report.

Key principle: A facility is included in the denominator (expected to report) for a given month if it has ever reported, and excluded after it has stopped reporting.

Rationale: This method recognizes that facilities may shut down permanently, for example due to decreased local population, insecurity, or diminished resources for service provision. It avoids underestimating reporting performance by only evaluating facilities during periods which they have demonstrated the capacity to report.

Illustration:

	adm1	adm2	adm3	hf_uid	date	allout	susp	test	conf	maltreat	report	status
0	msk1	msk2	msk3	hf_0001	2024-01	nan	nan	nan	nan	nan	No	Inactive
1	msk1	msk2	msk3	hf_0001	2024-02	nan	nan	nan	nan	nan	No	Inactive
2	msk1	msk2	msk3	hf_0001	2024-03	20	15	10	5	5	Yes	Active reporting
3	msk1	msk2	msk3	hf_0001	2024-04	30	15	10	8	5	Yes	Active reporting
4	msk1	msk2	msk3	hf_0001	2024-05	60	15	10	5	nan	Yes	Active reporting
5	msk1	msk2	msk3	hf_0001	2024-06	nan	nan	nan	nan	nan	No	Active not reporting
6	msk1	msk2	msk3	hf_0001	2024-07	nan	nan	nan	nan	nan	No	Active not reporting
7	msk1	msk2	msk3	hf_0001	2024-08	nan	nan	nan	nan	nan	No	Active not reporting
8	msk1	msk2	msk3	hf_0001	2024-09	5	5	5	5	5	Yes	Active reporting
9	msk1	msk2	msk3	hf_0001	2024-10	nan	nan	nan	nan	nan	No	Inactive
10	msk1	msk2	msk3	hf_0001	2024-11	nan	nan	nan	nan	nan	No	Inactive
11	msk1	msk2	msk3	hf_0001	2024-12	nan	nan	nan	nan	nan	No	Inactive

Method 3: Dynamic activation and inactivation

Criteria: A facility is classified as active once it starts reporting, and inactive during continuous months of non-reporting, for a specified minimum number of continuous months of non-reporting.

Key principle: A facility is excluded from the denominator (expected to report) whenever there is a continuous window of N months of non-reporting (for example, 6 months). The window size (N) can be configured based on program requirements.

Rationale: This method recognizes that facilities may have temporary interruptions in functionality due to various operational factors such as staff shortages, equipment issues, inaccessibility from natural disasters or insecurity. The facility may regain activity in the future as those factors change, then become inactive if those factors reappear. It provides a dynamic assessment that balances operational reality with accountability, allowing facilities to maintain “active” status even with occasional reporting gaps as long as they demonstrate recent engagement. However, it is not normal for a facility to be frequently changing between active and inactive status, and if you are seeing this when using Method 3, you should consider lengthening your window size or switching to Method 2.

Illustration

	adm1	adm2	adm3	hf_uid	date	allout	susp	test	conf	maltreat	report	status
0	msk1	msk2	msk3	hf_0001	2024-01	20	15	5	5	5	Yes	Active reporting
1	msk1	msk2	msk3	hf_0001	2024-02	nan	nan	nan	nan	nan	No	Inactive
2	msk1	msk2	msk3	hf_0001	2024-03	nan	nan	nan	nan	nan	No	Inactive
3	msk1	msk2	msk3	hf_0001	2024-04	nan	nan	nan	nan	nan	No	Inactive
4	msk1	msk2	msk3	hf_0001	2024-05	nan	nan	nan	nan	nan	No	Inactive
5	msk1	msk2	msk3	hf_0001	2024-06	nan	nan	nan	nan	nan	No	Inactive
6	msk1	msk2	msk3	hf_0001	2024-07	nan	nan	nan	nan	nan	No	Inactive
7	msk1	msk2	msk3	hf_0001	2024-08	nan	nan	nan	nan	nan	No	Inactive
8	msk1	msk2	msk3	hf_0001	2024-09	5	5	5	5	5	Yes	Active reporting
9	msk1	msk2	msk3	hf_0001	2024-10	nan	nan	nan	nan	nan	No	Active not reporting
10	msk1	msk2	msk3	hf_0001	2024-11	nan	nan	nan	nan	nan	No	Active not reporting
11	msk1	msk2	msk3	hf_0001	2024-12	nan	nan	nan	nan	nan	No	Active not reporting

Method Summary

Comparison Aspect	Method 1: Permanent Activation	Method 2: Activate/Inactivate with Last Report	Method 3: Dynamic Activation
Activation Criteria	First report received	First report received	First report received
Inactivation Criteria	Never (once active, always active)	After last report + grace period (e.g., 6 months)	After N consecutive months of non-reporting (e.g., 6 months)
Facility Status	Binary: inactive → permanent active	Binary: inactive → active → permanent inactive	Dynamic: can toggle between active/inactive multiple times
Handles Temporary Closures	❌ No	❌ No	✅ Yes
Handles Permanent Closures	❌ No	✅ Yes	✅ Yes
Data Requirements	Minimal historical data	Complete historical data preferred	Complete time series data
Best Use When	Analyzing new facilities or early program phases	Studying facility attrition/permanent closures	Monitoring ongoing operations with temporary disruptions
Advantages	Simple to implement; stable denominators	Accounts for permanent exits; avoids penalizing for closed facilities	Realistic for operational contexts; accommodates temporary issues
Limitations	Overestimates active facilities over time	May misclassify temporarily closed facilities as permanently closed	More complex; status can fluctuate; requires parameter tuning

Step-by-step

Let’s identify active facilities - we move into the step-by-step process for implementing this in code using example DHIS2 data from Sierra Leone. We assume you are working with cleaned and preprocessed routine surveillance data.

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Load packages and data

Step 1.1: Load required R packages

Load all necessary packages for data processing and visualization to determine health facility active status.

R
Python

# Install or load relevant packages
pacman::p_load(
  readxl,        # Read Excel files
  dplyr,         # Data manipulation
  tidyr,         # Data tidying
  lubridate,     # Date handling
  ggplot2,       # Data visualization
  RColorBrewer,  # Color palettes
  scales,        # Scale functions for ggplot2
  purrr,         # Functional programming
  DT,            # Interactive data tables
  writexl,       # Export to Excel
  reticulate,    # R-Python interoperability
  devtools       # Package management
)

# Install/update and load sntutils
if (!requireNamespace("sntutils", quietly = TRUE)) {
  devtools::install_github("ahadi-analytics/sntutils", quiet = TRUE, upgrade = "always")
} else {
  devtools::install_github("ahadi-analytics/sntutils", quiet = TRUE, upgrade = "always")
}

library(sntutils)

To adapt the code:

Line 3: Change directory paths to match the folder structure

import pandas as pd
from pyhere import here
import numpy as np
from matplotlib.colors import ListedColormap
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import seaborn as sns

To adapt the code:

Do not modify anything in the code above

Step 1.2: Import data

Load the preprocessed malaria routine data. This page continues the use of the preprocessed Sierra Leone DHIS2 data, obtained through following the steps on the DHIS2 preprocessing page.

R
Python

# Define file path using here package for reproducible paths
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")

# Load the preprocessed DHIS2 malaria surveillance data
df <- readRDS(data_filepath)

To adapt the code:

Line 3: Change directory paths to match the folder structure

dhis2_df = pd.read_parquet(here('english/data_r/routine_cases', 'dhis2_processed_data_python.parquet'))

Step 2: Configure reporting indicators and function

Step 2.1: Define reporting indicators

In this step we define the main reporting indicators for activity status. We also modify the format of the date column to store as proper Date objects rather than character strings.

R
Python

report_cols <- c("allout", "test", "pres", "conf", "maltreat", "maladm")

# Keep original date format as a separate column
df$date_original <- df$date

# Convert "YYYY-MM" to proper Date objects using base R as.Date
df$date <- as.Date(paste0(df$date, "-01"))

To adapt the code:

Do not modify anything in the code above

key_indicators = ['allout', 'test', 'pres', 'conf', 'maltreat', 'maladm']

To adapt the code:

Do not modify anything in the code above

Step 2.2: Reporting pattern identification function

We begin by identifying each health facilitiy’s first reporting date to implement classification method 1 (permanent activation).

R
Python

Show the code

# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)

# Add Year and YM columns
df <- df |>
  dplyr::mutate(
    Year = lubridate::year(date),
    Month = lubridate::month(date),
    YM = format(date, "%Y-%m")
  )

# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)

# Identify the first health facility reporting date using YM
first_reports <- df |>
  dplyr::filter(reported == 1) |>
  dplyr::group_by(hf_uid) |>
  dplyr::summarise(first_month_reported_YM = min(YM), .groups = "drop")

df <- df |>
  dplyr::left_join(first_reports, by = "hf_uid")

# Status classification (0, 0.5, 1)
df <- df |>
  dplyr::mutate(
    Facility_status = dplyr::case_when(
      reported == 1 ~ 1,
      reported == 0 & YM >= first_month_reported_YM ~ 0.5,
      TRUE ~ 0
    ),
    Facility_active = Facility_status > 0
  )

To adapt the code:

Do not modify anything in the code above

# make a copy of the data
dfr = dhis2_df.copy()

# add a column indicating whether the HF reported on any of the key indicators
dfr.insert(len(dfr.columns), 'key_variables', dfr[key_indicators].notna().any(axis = 1))
dfr.insert(len(dfr.columns), 'reported', np.where(dfr['key_variables'], 1, 0))

# drop unecessary columns = when consulted with team, Val to add normalised adm names functions and dftree to streamline these operations
cols = ['Year', 'Month', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid', 'key_variables', 'reported']
dfr = dfr[cols]

# compute first month reported for each HF and add column in dfr
t = dfr[dfr['reported'] == 1].groupby('hf_uid')['YM'].min().to_frame(name = 'first_month_reported').reset_index()

# make sure to keep all HFs in case some don't have a valid first month (never reported on anything)
temp = pd.DataFrame(dfr['hf_uid'].unique(), columns = ['hf_uid'])
t = temp.merge(t, on = 'hf_uid', how = 'left', validate = '1:1')
dfr = dfr.merge(t, on = 'hf_uid', how = 'left', validate = 'm:1')

# add HF status column:
# 0: not active
# 0.5: HF didn't report when considered active
# 1: active and reporting
dfr.insert(len(dfr.columns),
          'Facility_status',
          np.where(dfr['reported'] == 1, 1, np.where((dfr['reported'] == 0) & (dfr['YM'] >= dfr['first_month_reported']), 0.5, 0)))

# add active HF column
dfr.insert(len(dfr.columns), 'Facility_active', np.where(dfr['Facility_status'] == 0, False, True))

# quick visual check
dfr.head(10).style

	Year	Month	YM	adm0	adm0_uid	adm1	adm1_uid	adm2	adm2_uid	adm3	adm3_uid	hf	hf_uid	key_variables	reported	first_month_reported	Facility_status	Facility_active
0	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Aethel CHP	HF_00001	False	0	2019-01	0.000000	False
1	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Agape Way CHP	HF_00002	True	1	2015-01	1.000000	True
2	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Anglican Diocese Clinic	HF_00003	False	0	nan	0.000000	False
3	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Batiama Layout MCHP	HF_00004	False	0	2015-05	0.000000	False
4	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Bo Government Hospital	HF_00005	True	1	2015-01	1.000000	True
5	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Bo School Bay CHP	HF_00006	False	0	2022-01	0.000000	False
6	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Breakthrough MCHP	HF_00007	False	0	2023-10	0.000000	False
7	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Brima Town CHP	HF_00008	True	1	2015-01	1.000000	True
8	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	EDC Unit CHP	HF_00009	True	1	2015-01	1.000000	True
9	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Favour MCHP	HF_00010	True	1	2015-01	1.000000	True

To adapt the code:

Do not modify anything in the code above

Step 3: Method 1 - Permanent activity status identification

Step 3.1: Permanent activity status identification

Building off the previous step, this code classifies facilities as active if they reported or have reported before, otherwise inactive.

R
Python

df <- df |>
  dplyr::mutate(
    active_status_method1 = dplyr::case_when(
      Facility_status == 1 ~ "Active",
      Facility_status == 0.5 ~ "Active",
      Facility_status == 0 ~ "Inactive",
      TRUE ~ "Inactive"
    )
  )

cat("Method 1 (R) - Summary:\n")
cat("Total facilities:", length(unique(df$hf_uid)), "\n")
cat("Active facilities (ever reported):", length(unique(df$hf_uid[!is.na(df$first_month_reported_YM)])), "\n")
cat("Never reported facilities:", length(unique(df$hf_uid[is.na(df$first_month_reported_YM)])), "\n")

Output

Method 1 (R) - Summary:

Total facilities: 1771

Active facilities (ever reported): 1534

Never reported facilities: 237

Alternative code option using sntutils package

df_method1 <- sntutils::classify_facility_activity(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  key_indicators = report_cols,
  method = 1,
  binary_classification = TRUE,
  reporting_rule = "any_non_na"
)

# Method 1 is already implemented above as Facility_active
# This represents permanent activation after first report

print("Method 1 (Python) - Permanent Activation")

Method 1 (Python) - Permanent Activation

print(f"Total facilities: {len(dfr['hf_uid'].unique())}")

Total facilities: 1324

print(f"Active facilities (ever reported): {len(dfr[dfr['first_month_reported'].notna()]['hf_uid'].unique())}")

Active facilities (ever reported): 1127

print(f"Never reported facilities: {len(dfr[dfr['first_month_reported'].isna()]['hf_uid'].unique())}")

Never reported facilities: 197

To adapt the code:

Do not modify anything in the code above

Step 3.2: Define activity status visualization function

To simplify plotting each active status method, we define a function that generates corresponding visualizations based on defined input parameters.

R
Python

Show the code

plot_facility_activity <- function(
    data,
    method = c("method1", "method2", "method3"),
    level = c("national", "district"),
    facet_col = NULL,
    title = NULL,
    subtitle = NULL,
    plot_flips = FALSE
) {

  # Map method to column name
  status_col <- switch(method,
    "method1" = "active_status_method1",
    "method2" = "active_status_method2",
    "method3" = "active_status_method3",
    stop("Method must be 'method1', 'method2', or 'method3'")
  )

  # Method labels for titles
  method_labels <- c(
    "method1" = "Permanent Activation",
    "method2" = "First-to-Last Report",
    "method3" = "Dynamic Activation"
  )

  # Handle status flips for Method 3
  if (plot_flips && method == "method3") {
    # Identify facilities with status changes
    flip_facilities <- data |>
      arrange(hf_uid, date) |>
      group_by(hf_uid) |>
      summarise(has_flip = length(unique(.data[[status_col]])) > 1) |>
      filter(has_flip) |>
      pull(hf_uid)

    data <- data |>
      filter(hf_uid %in% flip_facilities)

    flip_count <- length(flip_facilities)
    subtitle <- paste("Showing", flip_count, "facilities with status flips")
  }

  # Set default titles if not provided
  if (is.null(title)) {
    title <- paste("Method", gsub("method", "", method), ":", method_labels[method])
  }

  if (is.null(subtitle) && !plot_flips) {
    subtitle <- switch(method,
      "method1" = "Facilities remain active indefinitely after first report",
      "method2" = "Facilities are active between first and last report",
      "method3" = "Handles temporary closures (6-month non-reporting threshold)"
    )
  }

  # Create base plot with consistent colors
  p <- ggplot(data, aes(x = date, y = reorder(hf_uid, total_reports), fill = .data[[status_col]])) +
    geom_tile() +
    scale_fill_manual(values = c("Active" = "pink", "Inactive" = "#47B5FF"), name = "Activity Status") +
    scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
    theme_minimal() +
    theme(
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank(),
      axis.text.x = element_text(angle = 45, hjust = 1),
      legend.position = "bottom",
      plot.title = element_text(face = "bold", size = 14),
      plot.subtitle = element_text(size = 11, color = "gray40")
    ) +
    labs(
      x = "Date",
      y = "Health Facilities",
      title = title,
      subtitle = subtitle
    )

  # ADD FLIP MARKERS only for Method 3 flips - EXCLUDING ACTIVATION
  if (plot_flips && method == "method3") {
    # Find exact flip points but exclude the first activation (inactive → active)
    flip_points <- data |>
      arrange(hf_uid, date) |>
      group_by(hf_uid) |>
      mutate(
        status_change = .data[[status_col]] != lag(.data[[status_col]]),
        # Identify first activation to exclude it
        first_activation = min(which(.data[[status_col]] == "Active")),
        flip_point = ifelse(status_change & row_number() > first_activation, as.character(date), NA)
      ) |>
      filter(!is.na(flip_point)) |>
      ungroup()

    # Add points at flip locations only if there are any flips
    if (nrow(flip_points) > 0) {
      p <- p +
        geom_point(data = flip_points,
                   aes(x = date, y = hf_uid),
                   color = "black", size = 1, shape = 21, fill = "white", stroke = 1)
    }
  }

  # Add faceting for district level
  if (level == "district" || !is.null(facet_col)) {
    if (is.null(facet_col)) {
      facet_col <- "adm1"
    }
    p <- p +
      facet_wrap(as.formula(paste("~", facet_col)), scales = "free_y", ncol = 4) +
      theme(
        axis.text.x = element_text(angle = 90, hjust = 1, size = 6),
        strip.text = element_text(size = 8)
      )
  }

  return(p)
}

To adapt the code:

Do not modify anything in the code above

To adapt the code:

Do not modify anything in the code above

Step 3.4: Permanent activity visualization

The active status visualization function defined in the previous step can then be applied to method 1 results.

R
Python

plot_facility_activity(df, method = "method1", level = "national")

plot_facility_activity(df, method = "method1", level = "district")

Output

Alternative code option using sntutils package

sntutils::facility_reporting_plot(
  data = dhis2_hf,
  hf_col = "hf_uid",
  date_col = "date",
  palette = "violet",
  key_indicators = vars_of_interest,
  facet_col = "adm2",       # for the facetting
  facet_ncol = 7,           # the number of cols for the facetting
  include_never_reported = TRUE,
  target_language = "fr",
  method = 1,
  year_breaks = 8,
  plot_path = val_plot_path,
  plot_width = 12,
  plot_height = 14,
  plot_scale = 0.6
)

To adapt the code:

Do not modify anything in the code above

To adapt the code:

Do not modify anything in the code above

Method 1 (permanent activation) implementation complete!

The classification of health facility active status *based only on their first reporting date** (i.e. permanent activation) is now complete.

The steps below build upon this to implement method 2 and 3 active status.

Step 4: Method 2 - First-to-last activity status identification

Step 4.1: First-to-last activity status identification

To begin method 2 classification, we identify each health facility’s last reporting date. This is used in tandem with the previously identified first reporting date (method 1) to determine active status using method 2.

R
Python

Show the code

# Method 2: Identify last reports and create active period
last_reports <- df |>
  dplyr::filter(reported == 1) |>
  dplyr::group_by(hf_uid) |>
  dplyr::summarise(last_month_reported_YM = max(YM), .groups = "drop")

df <- df |>
  dplyr::left_join(last_reports, by = "hf_uid")

# Method 2: Active only between first and last report
df <- df |>
  dplyr::mutate(
    Facility_status_method2 = dplyr::case_when(
      is.na(first_month_reported_YM) ~ 0,  # Never reported
      YM >= first_month_reported_YM & YM <= last_month_reported_YM & reported == 1 ~ 1,  # Active and reporting
      YM >= first_month_reported_YM & YM <= last_month_reported_YM & reported == 0 ~ 0.5,  # Active but not reporting
      TRUE ~ 0  # Outside active period
    ),
    Facility_active_method2 = Facility_status_method2 > 0,
    active_status_method2 = dplyr::if_else(Facility_active_method2, "Active", "Inactive")
  )

# More informative summary
total_facilities <- length(unique(df$hf_uid))
facilities_with_activity_period <- length(unique(df$hf_uid[!is.na(df$last_month_reported_YM)]))
never_reported <- length(unique(df$hf_uid[is.na(df$first_month_reported_YM)]))
currently_active <- df |>
  dplyr::filter(YM == max(YM)) |>
  dplyr::summarise(active_count = sum(active_status_method2 == "Active")) |>
  dplyr::pull(active_count)

cat("Method 2 (R) - First-to-Last Report Activation\n")
cat("Facilities with defined activity period:", facilities_with_activity_period, "\n")
cat("Never reported facilities:", never_reported, "\n")
cat("Currently active facilities:", currently_active, "\n")
cat("Facilities permanently closed:", facilities_with_activity_period - currently_active, "\n")

Output

Method 2 (R) - First-to-Last Report Activation

Facilities with defined activity period: 1534

Never reported facilities: 237

Currently active facilities: 1434

Facilities permanently closed: 100

Alternative code option using sntutils package

df_method2 <- sntutils::classify_facility_activity(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  key_indicators = report_cols,
  method = 2,
  binary_classification = TRUE,
  reporting_rule = "any_non_na"
)

To adapt the code:

Do not modify anything in the code above

To adapt the code:

Do not modify anything in the code above

Step 4.2: First-to-last activity status visualization

We can call the active status visualization function again here to visualize method 2 facility classification.

R
Python

Show the code

plot_facility_activity(df, method = "method2", level = "national")

plot_facility_activity(df, method = "method2", level = "district")

Output

Alternative code option using sntutils package

sntutils::facility_reporting_plot(
  data = dhis2_hf,
  hf_col = "hf_uid",
  date_col = "date",
  palette = "violet",
  key_indicators = vars_of_interest,
  facet_col = "adm2",       # for the facetting
  facet_ncol = 7,           # the number of cols for the facetting
  include_never_reported = TRUE,
  target_language = "fr",
  method = 2,
  year_breaks = 8,
  plot_path = val_plot_path,
  plot_width = 12,
  plot_height = 14,
  plot_scale = 0.6
)

To adapt the code:

Do not modify anything in the code above

To adapt the code:

Do not modify anything in the code above

Method 2 (first-to-last activation) implementation complete!

The classification of health facility active status *based on their first and last reporting date** (i.e. first-to-last activation) is now complete.

The steps below build upon this to implement method 3 active status.

Step 5: Method 3 - Dynamic activity status identification

Step 5.1: Dynamic activity status identification

The below determines active status based on 6+ consecutive months of non-reporting between the first and last reporting dates identified previously.

R
Python

Show the code

# Method 3: Calculate consecutive non-reporting months
df <- df |>
  dplyr::arrange(hf_uid, YM) |>
  dplyr::group_by(hf_uid) |>
  dplyr::mutate(
    # Calculate consecutive non-reporting counter
    consecutive_non_report = {
      counter <- 0
      purrr::map_dbl(reported, ~{
        if (.x == 1) {
          counter <<- 0
        } else {
          counter <<- counter + 1
        }
        counter
      })
    }
  ) |>
  dplyr::ungroup()

# Method 3: Inactive after 6+ consecutive months of non-reporting BETWEEN first and last reporting dates
df <- df |>
  dplyr::mutate(
    Facility_status_method3 = dplyr::case_when(
      is.na(first_month_reported_YM) ~ 0,  # Never reported
      YM < first_month_reported_YM ~ 0,  # Before first report
      consecutive_non_report >= 6 & YM <= last_month_reported_YM ~ 0,  # 6+ months non-reporting WITHIN active period
      reported == 1 ~ 1,  # Active and reporting
      TRUE ~ 0.5  # Active but not reporting
    ),
    Facility_active_method3 = Facility_status_method3 > 0,
    active_status_method3 = dplyr::if_else(Facility_active_method3, "Active", "Inactive")
  )

# Count facilities that change status
status_flip_facilities <- df |>
  dplyr::group_by(hf_uid) |>
  dplyr::summarise(
    has_status_change = length(unique(active_status_method3)) > 1,
    .groups = "drop"
  ) |>
  dplyr::filter(has_status_change)

cat("Method 3 (R) - Summary:\n")
cat("Facilities that experienced 6+ months non-reporting:", length(unique(df$hf_uid[df$consecutive_non_report >= 6])), "\n")
cat("Facilities with status changes:", nrow(status_flip_facilities), "\n")

Output

Method 3 (R) - Summary:

Facilities that experienced 6+ months non-reporting: 541

Facilities with status changes: 334

To adapt the code:

Do not modify anything in the code above

Alternative code option using sntutils package

df_method3 <- sntutils::classify_facility_activity(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  key_indicators = report_cols,
  method = 3,
  binary_classification = TRUE,
  reporting_rule = "any_non_na"
)

To adapt the code:

Do not modify anything in the code above

Step 5.2: Dynamic activity status visualization

Here we call the active status plotting function again to visualize method 3 results at both the national and district level.

R
Python

plot_facility_activity(df, method = "method3", level = "national")

plot_facility_activity(df, method = "method3", level = "district")

Output

Alternative code option using sntutils package

sntutils::facility_reporting_plot(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  palette = "violet",
  key_indicators = vars_of_interest,
  facet_col = "adm2",       # for the facetting
  facet_ncol = 7,           # the number of cols for the facetting
  include_never_reported = TRUE,
  target_language = "fr",
  method = 3,
  nonreport_window = 6, # Needed for method 3
  year_breaks = 8,
  plot_path = val_plot_path,
  plot_width = 12,
  plot_height = 14,
  plot_scale = 0.6
)

To adapt the code:

Do not modify anything in the code above

To adapt the code:

Do not modify anything in the code above

Step 5.3: Visualize dynamic activation flips

An additional visualization relevant to method 3 is the number of “flips” in status–that is, the number of times a facility switches from active, to inactive, to active again, etc. The defined plotting function can visualize flips too.

R
Python

# Show only facilities that flip status in Method 3
plot_facility_activity(
  df,
  method = "method3",
  level = "national",
  plot_flips = TRUE,
  title = "Method 3: Facilities with Dynamic Status Flips"
)

Output

To adapt the code:

Do not modify anything in the code above

Method 3 (dynamic activation) implementation complete!

The classification of health facility active status *based on their first and last reporting date as well as extended non-reporting periods** (i.e. dynamic activation) is now complete.

Step 6: Activity status method comparison

All three active status methods have now been applied. Visualizations allow us to compare these methods to better understand the nature of health facilities in the dataset and decide which method should be selected for further use.

sntutils::compare_methods_plot(
  data = df,
  hf_col = "hf_uid",
  date_col = "date",
  key_indicators = report_cols,
  language = "en"
)

Output

To adapt the code:

Do not modify anything in the code above

To adapt the code:

Do not modify anything in the code above

Step 7: Export results

Finally, we export results of active status in addition to df_expected which contains the expected reports of health facilities needed for reporting rate calculations.

R
Python

Show the code

# Create dftree without UIDs
cols <- c('adm0', 'adm1', 'adm2', 'adm3', 'hf', 'hf_uid')
dftree <- df |>
  dplyr::select(all_of(cols)) |>
  dplyr::distinct() |>
  dplyr::arrange(across(all_of(cols)))

# Add Year and YM columns to main data
df_with_ym <- df |>
  dplyr::mutate(
    Year = lubridate::year(date),
    Month = lubridate::month(date),
    YM = format(date, "%Y-%m")
  )

# Method 1 ONLY - Create monthly denominator for number of HFs active in each adm3
df_expected_method1 <- df_with_ym |>
  dplyr::group_by(Year, YM, adm3) |>
  dplyr::summarise(
    denominator = sum(active_status_method1 == "Active", na.rm = TRUE),
    .groups = "drop"
  )

# Add parent admin units
admin_cols <- c('adm0', 'adm1', 'adm2', 'adm3')
t <- dftree[admin_cols] |> dplyr::distinct()

df_expected_method1 <- df_expected_method1 |>
  dplyr::left_join(t, by = "adm3")

# Reorder columns
final_cols <- c('Year', 'YM', 'adm0', 'adm1', 'adm2', 'adm3', 'denominator')
df_expected_method1 <- df_expected_method1[final_cols] |>
  dplyr::arrange(across(all_of(final_cols)))

# Save results - ONLY Method 1 for now
# write.csv(df_expected_method1, "expected_reports_method1.csv", row.names = FALSE)

cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid']
dftree= dhis2_df[cols].drop_duplicates().reset_index(drop = True)

# create monthly denominator for number of HFs active in each adm2
df_expected = (dfr
     .groupby(['Year', 'YM', 'adm3_uid'])[['Facility_active']].sum(min_count = 1)
     .reset_index()
     .rename(columns = {'Facility_active': 'denominator'}))

# add parent admin units
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid']
t = dftree[cols].drop_duplicates().reset_index(drop = True)
df_expected = df_expected.merge(t, on = 'adm3_uid', how = 'left', validate = 'm:1')

# reorder columns
cols = ['Year', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'denominator']
df_expected = df_expected[cols].sort_values(by = cols).reset_index(drop = True)

# save
# df_expected.to_csv(here('english/data_r/routine_cases', 'df_expected.csv'), index = None)

# Inspect results
df_expected.head(10).style

	Year	YM	adm0	adm0_uid	adm1	adm1_uid	adm2	adm2_uid	adm3	adm3_uid	denominator
0	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	21
1	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Badjia Chiefdom	adm3_00002	2
2	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Bagbwe Chiefdom	adm3_00003	6
3	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Baoma Chiefdom	adm3_00004	16
4	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Bargbo Chiefdom	adm3_00005	8
5	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Bongor Chiefdom	adm3_00006	4
6	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Bumpe Ngao Chiefdom	adm3_00007	13
7	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Gbo Chiefdom	adm3_00008	2
8	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Jaiama Chiefdom	adm3_00009	3
9	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Kakua Chiefdom	adm3_00010	8

Full code

Show the code

# Method 1: Permanent Activation - Complete Code
# Load required R packages
pacman::p_load(
  readxl,        # Read Excel files
  dplyr,         # Data manipulation
  tidyr,         # Data tidying
  lubridate,     # Date handling
  ggplot2,       # Data visualization
  RColorBrewer,  # Color palettes
  scales,        # Scale functions for ggplot2
  purrr,         # Functional programming
  DT,            # Interactive data tables
  writexl,
  reticulate     # Export to Excel
)

# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)

# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")

# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)

# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)

# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Method 1: Determine reporting expectations
df <- dplyr::mutate(df, expected_to_report_method1 = base::ifelse(base::is.na(first_month_reported), "Never reported", base::ifelse(date >= first_month_reported, "Expected to report", "Not expected to report")))

# Generate final reporting status for method 1
df <- dplyr::mutate(df, reporting_status_method1 = base::ifelse(expected_to_report_method1 == "Never reported", "Never reported", base::ifelse(expected_to_report_method1 == "Expected to report" & reported == 1, "Expected and reported", base::ifelse(expected_to_report_method1 == "Expected to report" & reported == 0, "Expected but didn't report", "Not expected to report"))))

# Create status codes for method 1
df <- dplyr::mutate(df, status_code_method1 = dplyr::case_when(reporting_status_method1 == "Never reported" ~ 0, reporting_status_method1 == "Expected and reported" ~ 1, reporting_status_method1 == "Expected but didn't report" ~ 2, reporting_status_method1 == "Not expected to report" ~ 3))

# Create active status categories for method 1
df$active_status1 <- dplyr::case_when(
  df$reporting_status_method1 == "Expected and reported" ~ "Active",
  df$reporting_status_method1 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method1 == "Never reported" ~ "Inactive",
  df$reporting_status_method1 == "Not expected to report" ~ "Inactive"
)

# Create numeric codes for active status
df$active_status_code1 <- dplyr::case_when(
  df$active_status1 == "Active" ~ 1,
  df$active_status1 == "Inactive" ~ 0
)

# Save method 1 data to Excel
#writexl::write_xlsx(df, "active_status_method1.xlsx") #

# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)

# Define colors and labels for Method 1
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
                    "1" = "Expected and reported",
                    "2" = "Expected but didn't report",
                    "3" = "Not expected to report")

# Generate Overall Reporting Status Heatmap for Method 1
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
  geom_raster() +
  scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  theme_minimal() +
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
  ) +
  labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 1")

# Generate admin level reporting status heatmap for method 1
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 1 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")

ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 1  - Bo District")

# Generate Overall Active Status Heatmap for Method 1
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")

df$active_status_method1 <- dplyr::case_when(
  df$reporting_status_method1 == "Expected and reported" ~ "Active",
  df$reporting_status_method1 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method1 == "Never reported" ~ "Inactive",
  df$reporting_status_method1 == "Not expected to report" ~ "Inactive"
)

ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method1)) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap")

# Generate admin level active status heatmap for method 1
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method1))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 1 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

Show the code

# Method 2: Activate after first report, inactivate after last report - Complete Code
# Load required R packages
pacman::p_load(
  readxl,        # Read Excel files
  dplyr,         # Data manipulation
  tidyr,         # Data tidying
  lubridate,     # Date handling
  ggplot2,       # Data visualization
  RColorBrewer,  # Color palettes
  scales,        # Scale functions for ggplot2
  purrr,         # Functional programming
  DT,            # Interactive data tables
  writexl,
  reticulate     # Export to Excel
)

# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)

# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")

# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)

# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)

# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Method 2: Determine reporting expectations
df <- df |>
  dplyr::mutate(
    expected_to_report_method2 = ifelse(
      is.na(first_month_reported),
      "Never reported",
      ifelse(
        date >= first_month_reported & date <= last_month_reported,
        "Expected to report",
        "Not expected to report"
      )
    )
  )

# Determine final reporting status method 2
df <- df |>
  dplyr::mutate(
    reporting_status_method2 = ifelse(
      expected_to_report_method2 == "Never reported",
      "Never reported",
      ifelse(
        expected_to_report_method2 == "Expected to report" & reported == 1,
        "Expected and reported",
        ifelse(
          expected_to_report_method2 == "Expected to report" & reported == 0,
          "Expected but didn't report",
          "Not expected to report"
        )
      )
    )
  )

# Create status codes for method 2
df <- dplyr::mutate(df, status_code_method2 = dplyr::case_when(reporting_status_method2 == "Never reported" ~ 0, reporting_status_method2 == "Expected and reported" ~ 1, reporting_status_method2 == "Expected but didn't report" ~ 2, reporting_status_method2 == "Not expected to report" ~ 3))

# Create active status categories for method 2
df$active_status2 <- dplyr::case_when(
  df$reporting_status_method2 == "Expected and reported" ~ "Active",
  df$reporting_status_method2 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method2 == "Never reported" ~ "Inactive",
  df$reporting_status_method2 == "Not expected to report" ~ "Inactive"
)

# Create numeric codes for active status
df$active_status_code2 <- dplyr::case_when(
  df$active_status2 == "Active" ~ 1,
  df$active_status2 == "Inactive" ~ 0
)

# Save method 2 data to Excel
#writexl::write_xlsx(df, "active_status_method2.xlsx")

# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)

# Define colors and labels for Method 2
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
                    "1" = "Expected and reported",
                    "2" = "Expected but didn't report",
                    "3" = "Not expected to report")

# Generate Overall Reporting Status Heatmap for Method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
  geom_raster() +
  scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  theme_minimal() +
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
  ) +
  labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 2")

# Generate admin level reporting status heatmap for method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 2 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")

ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 2 - Bo District")

# Generate Overall Active Status Heatmap for Method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")

df$active_status_method2 <- dplyr::case_when(
  df$reporting_status_method2 == "Expected and reported" ~ "Active",
  df$reporting_status_method2 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method2 == "Never reported" ~ "Inactive",
  df$reporting_status_method2 == "Not expected to report" ~ "Inactive"
)

ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method2)) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 2")

# Generate admin level active status heatmap for method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method2))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 2 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

# Method 3: Dynamic activation and inactivation - Complete Code
# Load required R packages
pacman::p_load(
  readxl,        # Read Excel files
  dplyr,         # Data manipulation
  tidyr,         # Data tidying
  lubridate,     # Date handling
  ggplot2,       # Data visualization
  RColorBrewer,  # Color palettes
  scales,        # Scale functions for ggplot2
  purrr,         # Functional programming
  DT,            # Interactive data tables
  writexl,
  reticulate     # Export to Excel
)

# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)

# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")

# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)

# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)

# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)

# Method 3: Determine active and inactive hf
df <- df |>
  dplyr::arrange(hf_uid, date) |>
  dplyr::group_by(hf_uid) |>
  dplyr::mutate(
    # Create a logical vector indicating runs of zeros >= 6
    zero_run = {
      # rle() computes lengths and values of consecutive identical elements
      r <- base::rle(reported == 0)
      # Identify which runs are zeros AND have length >= 6
      run_flag <- r$values & r$lengths >= 6
      # Repeat the TRUE/FALSE flags for all months in the run
      base::rep(run_flag, r$lengths)
    },
    # Assign status based on zero_run
    status_method3 = base::ifelse(zero_run, "Inactive", "Active")
  ) |>
  dplyr::ungroup()

# Determine reporting status method 3
df <- df |>
  dplyr::mutate(
    expected_to_report_method3 = ifelse(
      is.na(first_month_reported),
      "Never reported",
      ifelse(
        status_method3 == "Active",
        "Expected to report",
        "Not expected to report"
      )
    )
  )

# Determine final reporting status method 3
df <- df |>
  dplyr::mutate(
    reporting_status_method3 = ifelse(
      expected_to_report_method3 == "Never reported",
      "Never reported",
      ifelse(
        expected_to_report_method3 == "Expected to report" & reported == 1,
        "Expected and reported",
        ifelse(
          expected_to_report_method3 == "Expected to report" & reported == 0,
          "Expected but didn't report",
          "Not expected to report"
        )
      )
    )
  )

# Create status codes for method 3
df <- dplyr::mutate(df, status_code_method3 = dplyr::case_when(reporting_status_method3 == "Never reported" ~ 0, reporting_status_method3 == "Expected and reported" ~ 1, reporting_status_method3 == "Expected but didn't report" ~ 2, reporting_status_method3 == "Not expected to report" ~ 3))

# Create active status categories for method 3
df$active_status3 <- dplyr::case_when(
  df$reporting_status_method3 == "Expected and reported" ~ "Active",
  df$reporting_status_method3 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method3 == "Never reported" ~ "Inactive",
  df$reporting_status_method3 == "Not expected to report" ~ "Inactive"
)

# Create numeric codes for active status
df$active_status_code3 <- dplyr::case_when(
  df$active_status3 == "Active" ~ 1,
  df$active_status3 == "Inactive" ~ 0
)

# Save method 3 data to Excel
#writexl::write_xlsx(df, "active_status_method3.xlsx")

# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)

# Define colors and labels for Method 3
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
                    "1" = "Expected and reported",
                    "2" = "Expected but didn't report",
                    "3" = "Not expected to report")

# Generate Overall Reporting Status Heatmap for Method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
  geom_raster() +
  scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  theme_minimal() +
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
  ) +
  labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 3")

# Generate admin level reporting status heatmap for method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 3 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")

ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 3 - Bo District")

# Generate Overall Active Status Heatmap for Method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")

df$active_status_method3 <- dplyr::case_when(
  df$reporting_status_method3 == "Expected and reported" ~ "Active",
  df$reporting_status_method3 == "Expected but didn't report" ~ "Active",
  df$reporting_status_method3 == "Never reported" ~ "Inactive",
  df$reporting_status_method3 == "Not expected to report" ~ "Inactive"
)

ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method3)) +
  ggplot2::geom_raster() +
  ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    axis.text.y = ggplot2::element_blank(),
    axis.ticks.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
  ) +
  ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 3")

# Generate admin level active status heatmap for method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()

for (area in adm1_areas) {
  df_filtered <- df[df$adm1 == area, ]
  df_filtered <- dplyr::filter(df, adm1 == area)

    p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method3))) +
        ggplot2::geom_raster() +
        ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
        ggplot2::theme_minimal() +
        ggplot2::theme(
            axis.text.y = ggplot2::element_blank(),
            axis.ticks.y = ggplot2::element_blank(),
            axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
        ) +
        ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 3 -", area))

    plots_list[[area]] <- p
    base::print(p)
}

Python

Show the code

import pandas as pd
from pyhere import here
import numpy as np
from matplotlib.colors import ListedColormap
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import seaborn as sns

Output

Step 1.2: Load and prepare data

Now we import the DHIS2 dataset that was initially processed in the DHIS2 Data Preprocessing section of this code library.

Python Val

Show the code

dhis2_df = pd.read_parquet(here('english/data_r/routine_cases', 'dhis2_processed_data_python.parquet'))

Output

Step 1.3: Determine reporting status

ADD

Val’s text: Here we create an intermediate dataframe storing the monthly reporting status of each Health Facility.

Python Val
Summary

Show the code

key_indicators = ['allout', 'test', 'pres', 'conf', 'maltreat', 'maladm']

# make a copy of the data
dfr = dhis2_df.copy()

# add a column indicating whether the HF reported on any of the key indicators
dfr.insert(len(dfr.columns), 'key_variables', dfr[key_indicators].notna().any(axis = 1))
dfr.insert(len(dfr.columns), 'reported', np.where(dfr['key_variables'], 1, 0))

# drop unecessary columns = when consulted with team, Val to add normalised adm names functions and dftree to streamline these operations
cols = ['Year', 'Month', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid', 'key_variables', 'reported']
dfr = dfr[cols]

# compute first month reported for each HF and add column in dfr
t = dfr[dfr['reported'] == 1].groupby('hf_uid')['YM'].min().to_frame(name = 'first_month_reported').reset_index()

# make sure to keep all HFs in case some don't have a valid first month (never reported on anything)
temp = pd.DataFrame(dfr['hf_uid'].unique(), columns = ['hf_uid'])
t = temp.merge(t, on = 'hf_uid', how = 'left', validate = '1:1')
dfr = dfr.merge(t, on = 'hf_uid', how = 'left', validate = 'm:1')

# add HF status column:
# 0: not active
# 0.5: HF didn't report when considered active
# 1: active and reporting
dfr.insert(len(dfr.columns),
          'Facility_status',
          np.where(dfr['reported'] == 1, 1, np.where((dfr['reported'] == 0) & (dfr['YM'] >= dfr['first_month_reported']), 0.5, 0)))

# add active HF column
dfr.insert(len(dfr.columns), 'Facility_active', np.where(dfr['Facility_status'] == 0, False, True))

# quick visual check
dfr.head(10).style

	Year	Month	YM	adm0	adm0_uid	adm1	adm1_uid	adm2	adm2_uid	adm3	adm3_uid	hf	hf_uid	key_variables	reported	first_month_reported	Facility_status	Facility_active
0	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Aethel CHP	HF_00001	False	0	2019-01	0.000000	False
1	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Agape Way CHP	HF_00002	True	1	2015-01	1.000000	True
2	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Anglican Diocese Clinic	HF_00003	False	0	nan	0.000000	False
3	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Batiama Layout MCHP	HF_00004	False	0	2015-05	0.000000	False
4	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Bo Government Hospital	HF_00005	True	1	2015-01	1.000000	True
5	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Bo School Bay CHP	HF_00006	False	0	2022-01	0.000000	False
6	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Breakthrough MCHP	HF_00007	False	0	2023-10	0.000000	False
7	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Brima Town CHP	HF_00008	True	1	2015-01	1.000000	True
8	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	EDC Unit CHP	HF_00009	True	1	2015-01	1.000000	True
9	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Favour MCHP	HF_00010	True	1	2015-01	1.000000	True

Output

	Year	Month	YM	adm0	adm0_uid	adm1	adm1_uid	adm2	adm2_uid	adm3	adm3_uid	hf	hf_uid	key_variables	reported	first_month_reported	Facility_status	Facility_active
0	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Aethel CHP	HF_00001	False	0	2019-01	0.000000	False
1	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Agape Way CHP	HF_00002	True	1	2015-01	1.000000	True
2	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Anglican Diocese Clinic	HF_00003	False	0	nan	0.000000	False
3	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Batiama Layout MCHP	HF_00004	False	0	2015-05	0.000000	False
4	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Bo Government Hospital	HF_00005	True	1	2015-01	1.000000	True
5	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Bo School Bay CHP	HF_00006	False	0	2022-01	0.000000	False
6	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Breakthrough MCHP	HF_00007	False	0	2023-10	0.000000	False
7	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Brima Town CHP	HF_00008	True	1	2015-01	1.000000	True
8	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	EDC Unit CHP	HF_00009	True	1	2015-01	1.000000	True
9	2015	1	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	Favour MCHP	HF_00010	True	1	2015-01	1.000000	True

Step 1.3.a Visualize reporting rate

Python Val

Show the code

# SS: added to resolve render error



# Plot a visual of monthly HF-level reporting status for the country
df = (dfr.pivot(index = ['hf_uid', 'first_month_reported'], columns = 'YM', values = 'Facility_status')
     .sort_values(by = 'first_month_reported'))

df = df.reset_index().drop(['first_month_reported'], axis = 1).set_index(['hf_uid'])

# Prep colours and labels for cmap and legend
colours = [status_params_dict[i]['colour'] for i in sorted(status_params_dict.keys())]
labels = [status_params_dict[i]['label'] for i in sorted(status_params_dict.keys())]

cmap = ListedColormap(colours)

# Make figure
fs = 15
fig, ax = plt.subplots(figsize = (15, 10))
sns.heatmap(ax = ax, data = df, cmap = cmap, cbar = None)
ax.set_xlabel('')
ax.set_xticks(ax.get_xticks())
ax.set_xticklabels([l.get_text()[0:7] for l in ax.get_xticklabels()], rotation = 45, ha = 'right')
ax.set_yticks([])

[]

Show the code

ax.set_ylabel('HEALTH FACILITY', size = fs)

# Make legend
handles = [mpatches.Patch(color = c, label = l) for c, l in zip(colours, labels)]

ax.legend(handles = handles,
          fontsize = fs,
          ncols = 3,
          bbox_to_anchor = (0.5, -0.1),
          loc = 'upper center')

fig.tight_layout()

# Save
# discuss with team

Output

[]

To adapt the code:

Step 2: Determine active and inactive status

Method 1: First Report Activation

I CAN’T REALLY TELL WHAT IS GOING ON IN THE REMAINING CODE HERE. I REMOVED THE UID AND YYYY-MM SINCE THAT SHOULD BE TAKEN CARE OF IN THE DATA PREPROCESSING! THE DATASET LOADED IN 1.2 SHOULD ALREADY INCLUDE THOSE.

Step 2 Method 2:

ADD

Step 2 Method 3:

ADD

Step 2 Method Val:

Prepare your exptected reports dataframe - df_expected

VT I would suggest dissociating between outpatient and inpatient indicators here, I normally do it. Don’t wan’t to modify the structure too much before discussing with team

Here we build a dataframe storing the number of active Health Facilities for each month of the period studied. This dataframe will be useful in subsequent sections (link to RR and Incidence adjustment sections).

Python

Show the code

# create dftree
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid']
dftree= dhis2_df[cols].drop_duplicates().reset_index(drop = True)

# create monthly denominator for number of HFs active in each adm2
df_expected = (dfr
     .groupby(['Year', 'YM', 'adm3_uid'])[['Facility_active']].sum(min_count = 1)
     .reset_index()
     .rename(columns = {'Facility_active': 'denominator'}))

# add parent admin units
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid']
t = dftree[cols].drop_duplicates().reset_index(drop = True)
df_expected = df_expected.merge(t, on = 'adm3_uid', how = 'left', validate = 'm:1')

# reorder columns
cols = ['Year', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'denominator']
df_expected = df_expected[cols].sort_values(by = cols).reset_index(drop = True)

# save
df_expected.to_csv(here('english/data_r/routine_cases', 'df_expected.csv'), index = None)

# Inspect results
df_expected.head(10).style

Output

	Year	YM	adm0	adm0_uid	adm1	adm1_uid	adm2	adm2_uid	adm3	adm3_uid	denominator
0	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo City Council	adm2_00001	Bo City	adm3_00001	21
1	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Badjia Chiefdom	adm3_00002	2
2	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Bagbwe Chiefdom	adm3_00003	6
3	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Baoma Chiefdom	adm3_00004	16
4	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Bargbo Chiefdom	adm3_00005	8
5	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Bongor Chiefdom	adm3_00006	4
6	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Bumpe Ngao Chiefdom	adm3_00007	13
7	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Gbo Chiefdom	adm3_00008	2
8	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Jaiama Chiefdom	adm3_00009	3
9	2015	2015-01	Sierra Leone	adm0_00001	Bo District	adm1_00001	Bo District Council	adm2_00002	Kakua Chiefdom	adm3_00010	8

To adapt the code:

Step 3: Assign expected and observed reporting status accounting for active/inactive

Step 3.1: Create summary statistics

Step 3.2: Create detailed reporting status

Step 3.3: Assign final status with priority

Step 3.4: Sort and prepare data for visualization

Step 4: Visualise processed data

Step 4.1: Set up data

Step 4.2: Make heatmap

Step 4.3: Make number by time

Python Val

Show the code

# quick dfden visual
df = df_expected.groupby('YM')['denominator'].sum(min_count = 1).reset_index()

fig, ax = plt.subplots()
df.plot(ax = ax, x = 'YM', y = 'denominator', label = 'Number of active HFs', color = status_params_dict[1]['colour'])
ax.set_xlabel('')
fig.tight_layout()

Output

Step 5: Save data

ADD

Full code

Find the full code scripts for determining active and inactive status of health facilities below.

R
Python