| adm1 | adm2 | adm3 | hf_uid | date | allout | susp | test | conf | maltreat | report | status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | msk1 | msk2 | msk3 | hf_0001 | 2024-01 | nan | nan | nan | nan | nan | No | Inactive |
| 1 | msk1 | msk2 | msk3 | hf_0001 | 2024-02 | nan | nan | nan | nan | nan | No | Inactive |
| 2 | msk1 | msk2 | msk3 | hf_0001 | 2024-03 | 20 | 15 | 10 | 5 | 5 | Yes | Active reporting |
| 3 | msk1 | msk2 | msk3 | hf_0001 | 2024-04 | 30 | 15 | 10 | 8 | 5 | Yes | Active reporting |
| 4 | msk1 | msk2 | msk3 | hf_0001 | 2024-05 | 60 | 15 | 10 | 5 | nan | Yes | Active reporting |
| 5 | msk1 | msk2 | msk3 | hf_0001 | 2024-06 | nan | nan | nan | nan | nan | No | Active not reporting |
| 6 | msk1 | msk2 | msk3 | hf_0001 | 2024-07 | nan | nan | nan | nan | nan | No | Active not reporting |
| 7 | msk1 | msk2 | msk3 | hf_0001 | 2024-08 | nan | nan | nan | nan | nan | No | Active not reporting |
| 8 | msk1 | msk2 | msk3 | hf_0001 | 2024-09 | 5 | 5 | 5 | 5 | 5 | Yes | Active reporting |
| 9 | msk1 | msk2 | msk3 | hf_0001 | 2024-10 | nan | nan | nan | nan | nan | No | Active not reporting |
| 10 | msk1 | msk2 | msk3 | hf_0001 | 2024-11 | nan | nan | nan | nan | nan | No | Active not reporting |
| 11 | msk1 | msk2 | msk3 | hf_0001 | 2024-12 | nan | nan | nan | nan | nan | No | Active not reporting |
Determining active and inactive status
Overview
In the SNT workflow, reporting rate calculations, which are essential to the estimation of other key indicators such as incidence, depend on the activity status of each health facility.
- Classify health facility activity status to define reporting rate denominator
- Visualize the status of malaria reporting in the country
Key concepts in defining active facilities
To be able to proceed with reporting rates calculations, we first need to determine whether each health facility was active in a given month, that is, whether it was expected to report.
The method used to define facility activity status should be discussed with the SNT team, who will guide whether the country has an established or preferred method. In some cases, the NMP may already rely on a Health Facility Master List to identify active facilities. While this can be a useful starting point, it may not always reflect real-time service delivery or facility functionality, and its reliability should be carefully assessed.
If no trusted method exists, or if additional validation is needed, an alternative data-driven approach can be used. This approach infers activity status directly from routine surveillance data, based on whether a facility reported any valid values for key malaria indicators.
For each health facility (HF) on a given month:
- If the HF submitted valid (non-NA) data for any key indicator → it is classified as active reporting
- If the HF did not report on any key indicators:
- If it has reported in any prior month → active not reporting
- If it has never reported → inactive
This data-driven approach offers a flexible alternative when no reliable master list exists or when further validation is required. It uses observed reporting patterns to classify activity status, based on whether a facility submitted valid data for selected malaria indicators.
These key indicators, such as allout, test, susp, pres, conf, and treat (for example), reflect core functions of malaria service delivery, including suspected case reporting, diagnostic testing, and treatment. If a facility reports on any of these indicators in a given month, it can reasonably be considered operational and engaged in the malaria surveillance system.
VT: Adding this section here as discussed with the team - however I see the steps in the code below seem to correspond to a different approach, i.e. using the health facility master list to determine HF acitvity status. What I am adding here is the alternative method we have been using in SLE, which assigns activity status based on reporting of certain key indicators. Having discussed with Bea in the SLE SNT call, sounds like the two approaches might need to be combined in most cases - just posting this note for clarity
Ideally, analysts should receive a copy of the Master Facility List (MFL) which includes columns for active/inactive status of health facilities. This is typically the most accurate and up-to-date classification of facility active/inactive status. If provided, this information should be used to generate active status visualizations and reporting rate analysis. Review the Merging shapefiles with tabular data page to merge your MFL with DHIS2 data and proceed with the visualization steps on this page.
In the absence of health facility active status information in the MFL, active/inactive status may be determined through one of the three methods below based on what is designated as a key indicator.
The selection of key indicators (and the method used to define facility activity) should be discussed and validated with the SNT team. In some countries, a Health Facility Master List may be appropriate; in others, indicator-based definitions may be more reliable. The final approach should reflect how malaria services are delivered and reported within the national system.
In most countries, a separate monthly activity status may be needed when calculating reporting rates for IPD or OPD-specific indicators. For example, inpatient indicators should only include facilities with inpatient capacity. The criteria for inclusion should be discussed with the program. While facility type (e.g. hospital or health center with wards) can help, it may not always be definitive.
Methods for determining active and inactive status of health facilities from reporting status
A health facility is considered “active” for a given month based on three different methods, each with distinct criteria to classify facilities as active or inactive. Below are the three methods:
Method 1: Permanent activation
Criteria: A facility is classified as active from its first reporting month onwards, and inactive before its first report.
Key principle: A facility is only included in the denominator (expected to report) starting from the month it first actually reported any malaria data. Before that first reporting month, the facility is considered “inactive” and not expected to report.
Rationale: This method recognizes that facilities may not exist, be operational, have DHIS2 access, or be participating in malaria surveillance from the beginning of the analysis period. It avoids underestimating reporting performance by only evaluating facilities during periods after which they have demonstrated the capacity to report.
Illustration:
Method 2: Activate after first report, inactivate after last report
Criteria: A facility is classified as active once it starts reporting, and inactive after its last report. To avoid mis-attributing non-reporting as inactivity in the most recent months of the dataset, we can also require a minimum number of non-reports (for example, 6 months) after the facility’s last report.
Key principle: A facility is included in the denominator (expected to report) for a given month if it has ever reported, and excluded after it has stopped reporting.
Rationale: This method recognizes that facilities may shut down permanently, for example due to decreased local population, insecurity, or diminished resources for service provision. It avoids underestimating reporting performance by only evaluating facilities during periods which they have demonstrated the capacity to report.
Illustration:
| adm1 | adm2 | adm3 | hf_uid | date | allout | susp | test | conf | maltreat | report | status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | msk1 | msk2 | msk3 | hf_0001 | 2024-01 | nan | nan | nan | nan | nan | No | Inactive |
| 1 | msk1 | msk2 | msk3 | hf_0001 | 2024-02 | nan | nan | nan | nan | nan | No | Inactive |
| 2 | msk1 | msk2 | msk3 | hf_0001 | 2024-03 | 20 | 15 | 10 | 5 | 5 | Yes | Active reporting |
| 3 | msk1 | msk2 | msk3 | hf_0001 | 2024-04 | 30 | 15 | 10 | 8 | 5 | Yes | Active reporting |
| 4 | msk1 | msk2 | msk3 | hf_0001 | 2024-05 | 60 | 15 | 10 | 5 | nan | Yes | Active reporting |
| 5 | msk1 | msk2 | msk3 | hf_0001 | 2024-06 | nan | nan | nan | nan | nan | No | Active not reporting |
| 6 | msk1 | msk2 | msk3 | hf_0001 | 2024-07 | nan | nan | nan | nan | nan | No | Active not reporting |
| 7 | msk1 | msk2 | msk3 | hf_0001 | 2024-08 | nan | nan | nan | nan | nan | No | Active not reporting |
| 8 | msk1 | msk2 | msk3 | hf_0001 | 2024-09 | 5 | 5 | 5 | 5 | 5 | Yes | Active reporting |
| 9 | msk1 | msk2 | msk3 | hf_0001 | 2024-10 | nan | nan | nan | nan | nan | No | Inactive |
| 10 | msk1 | msk2 | msk3 | hf_0001 | 2024-11 | nan | nan | nan | nan | nan | No | Inactive |
| 11 | msk1 | msk2 | msk3 | hf_0001 | 2024-12 | nan | nan | nan | nan | nan | No | Inactive |
Method 3: Dynamic activation and inactivation
Criteria: A facility is classified as active once it starts reporting, and inactive during continuous months of non-reporting, for a specified minimum number of continuous months of non-reporting.
Key principle: A facility is excluded from the denominator (expected to report) whenever there is a continuous window of N months of non-reporting (for example, 6 months). The window size (N) can be configured based on program requirements.
Rationale: This method recognizes that facilities may have temporary interruptions in functionality due to various operational factors such as staff shortages, equipment issues, inaccessibility from natural disasters or insecurity. The facility may regain activity in the future as those factors change, then become inactive if those factors reappear. It provides a dynamic assessment that balances operational reality with accountability, allowing facilities to maintain “active” status even with occasional reporting gaps as long as they demonstrate recent engagement. However, it is not normal for a facility to be frequently changing between active and inactive status, and if you are seeing this when using Method 3, you should consider lengthening your window size or switching to Method 2.
Illustration
| adm1 | adm2 | adm3 | hf_uid | date | allout | susp | test | conf | maltreat | report | status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | msk1 | msk2 | msk3 | hf_0001 | 2024-01 | 20 | 15 | 5 | 5 | 5 | Yes | Active reporting |
| 1 | msk1 | msk2 | msk3 | hf_0001 | 2024-02 | nan | nan | nan | nan | nan | No | Inactive |
| 2 | msk1 | msk2 | msk3 | hf_0001 | 2024-03 | nan | nan | nan | nan | nan | No | Inactive |
| 3 | msk1 | msk2 | msk3 | hf_0001 | 2024-04 | nan | nan | nan | nan | nan | No | Inactive |
| 4 | msk1 | msk2 | msk3 | hf_0001 | 2024-05 | nan | nan | nan | nan | nan | No | Inactive |
| 5 | msk1 | msk2 | msk3 | hf_0001 | 2024-06 | nan | nan | nan | nan | nan | No | Inactive |
| 6 | msk1 | msk2 | msk3 | hf_0001 | 2024-07 | nan | nan | nan | nan | nan | No | Inactive |
| 7 | msk1 | msk2 | msk3 | hf_0001 | 2024-08 | nan | nan | nan | nan | nan | No | Inactive |
| 8 | msk1 | msk2 | msk3 | hf_0001 | 2024-09 | 5 | 5 | 5 | 5 | 5 | Yes | Active reporting |
| 9 | msk1 | msk2 | msk3 | hf_0001 | 2024-10 | nan | nan | nan | nan | nan | No | Active not reporting |
| 10 | msk1 | msk2 | msk3 | hf_0001 | 2024-11 | nan | nan | nan | nan | nan | No | Active not reporting |
| 11 | msk1 | msk2 | msk3 | hf_0001 | 2024-12 | nan | nan | nan | nan | nan | No | Active not reporting |
Method Summary
| Comparison Aspect | Method 1: Permanent Activation | Method 2: Activate/Inactivate with Last Report | Method 3: Dynamic Activation |
|---|---|---|---|
| Activation Criteria | First report received | First report received | First report received |
| Inactivation Criteria | Never (once active, always active) | After last report + grace period (e.g., 6 months) | After N consecutive months of non-reporting (e.g., 6 months) |
| Facility Status | Binary: inactive → permanent active | Binary: inactive → active → permanent inactive | Dynamic: can toggle between active/inactive multiple times |
| Handles Temporary Closures | ❌ No | ❌ No | ✅ Yes |
| Handles Permanent Closures | ❌ No | ✅ Yes | ✅ Yes |
| Data Requirements | Minimal historical data | Complete historical data preferred | Complete time series data |
| Best Use When | Analyzing new facilities or early program phases | Studying facility attrition/permanent closures | Monitoring ongoing operations with temporary disruptions |
| Advantages | Simple to implement; stable denominators | Accounts for permanent exits; avoids penalizing for closed facilities | Realistic for operational contexts; accommodates temporary issues |
| Limitations | Overestimates active facilities over time | May misclassify temporarily closed facilities as permanently closed | More complex; status can fluctuate; requires parameter tuning |
Step-by-step
Let’s identify active facilities - we move into the step-by-step process for implementing this in code using example DHIS2 data from Sierra Leone. We assume you are working with cleaned and preprocessed routine surveillance data.
To skip the step-by-step explanation, jump to the full code at the end of this page.
Step 1: Load packages and data
Step 1.1: Load required R packages
Load all necessary packages for data processing and visualization to determine health facility active status.
# Install or load relevant packages
pacman::p_load(
readxl, # Read Excel files
dplyr, # Data manipulation
tidyr, # Data tidying
lubridate, # Date handling
ggplot2, # Data visualization
RColorBrewer, # Color palettes
scales, # Scale functions for ggplot2
purrr, # Functional programming
DT, # Interactive data tables
writexl, # Export to Excel
reticulate, # R-Python interoperability
devtools # Package management
)
# Install/update and load sntutils
if (!requireNamespace("sntutils", quietly = TRUE)) {
devtools::install_github("ahadi-analytics/sntutils", quiet = TRUE, upgrade = "always")
} else {
devtools::install_github("ahadi-analytics/sntutils", quiet = TRUE, upgrade = "always")
}
library(sntutils)To adapt the code:
- Line 3: Change directory paths to match the folder structure
Step 1.2: Import data
Load the preprocessed malaria routine data. This page continues the use of the preprocessed Sierra Leone DHIS2 data, obtained through following the steps on the DHIS2 preprocessing page.
To adapt the code:
- Line 3: Change directory paths to match the folder structure
Step 2: Configure reporting indicators and function
Step 2.1: Define reporting indicators
In this step we define the main reporting indicators for activity status. We also modify the format of the date column to store as proper Date objects rather than character strings.
To adapt the code:
- Do not modify anything in the code above
Step 2.2: Reporting pattern identification function
We begin by identifying each health facilitiy’s first reporting date to implement classification method 1 (permanent activation).
Show the code
# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)
# Add Year and YM columns
df <- df |>
dplyr::mutate(
Year = lubridate::year(date),
Month = lubridate::month(date),
YM = format(date, "%Y-%m")
)
# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)
# Identify the first health facility reporting date using YM
first_reports <- df |>
dplyr::filter(reported == 1) |>
dplyr::group_by(hf_uid) |>
dplyr::summarise(first_month_reported_YM = min(YM), .groups = "drop")
df <- df |>
dplyr::left_join(first_reports, by = "hf_uid")
# Status classification (0, 0.5, 1)
df <- df |>
dplyr::mutate(
Facility_status = dplyr::case_when(
reported == 1 ~ 1,
reported == 0 & YM >= first_month_reported_YM ~ 0.5,
TRUE ~ 0
),
Facility_active = Facility_status > 0
)To adapt the code:
- Do not modify anything in the code above
# make a copy of the data
dfr = dhis2_df.copy()
# add a column indicating whether the HF reported on any of the key indicators
dfr.insert(len(dfr.columns), 'key_variables', dfr[key_indicators].notna().any(axis = 1))
dfr.insert(len(dfr.columns), 'reported', np.where(dfr['key_variables'], 1, 0))
# drop unecessary columns = when consulted with team, Val to add normalised adm names functions and dftree to streamline these operations
cols = ['Year', 'Month', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid', 'key_variables', 'reported']
dfr = dfr[cols]
# compute first month reported for each HF and add column in dfr
t = dfr[dfr['reported'] == 1].groupby('hf_uid')['YM'].min().to_frame(name = 'first_month_reported').reset_index()
# make sure to keep all HFs in case some don't have a valid first month (never reported on anything)
temp = pd.DataFrame(dfr['hf_uid'].unique(), columns = ['hf_uid'])
t = temp.merge(t, on = 'hf_uid', how = 'left', validate = '1:1')
dfr = dfr.merge(t, on = 'hf_uid', how = 'left', validate = 'm:1')
# add HF status column:
# 0: not active
# 0.5: HF didn't report when considered active
# 1: active and reporting
dfr.insert(len(dfr.columns),
'Facility_status',
np.where(dfr['reported'] == 1, 1, np.where((dfr['reported'] == 0) & (dfr['YM'] >= dfr['first_month_reported']), 0.5, 0)))
# add active HF column
dfr.insert(len(dfr.columns), 'Facility_active', np.where(dfr['Facility_status'] == 0, False, True))
# quick visual check
dfr.head(10).style| Year | Month | YM | adm0 | adm0_uid | adm1 | adm1_uid | adm2 | adm2_uid | adm3 | adm3_uid | hf | hf_uid | key_variables | reported | first_month_reported | Facility_status | Facility_active | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Aethel CHP | HF_00001 | False | 0 | 2019-01 | 0.000000 | False |
| 1 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Agape Way CHP | HF_00002 | True | 1 | 2015-01 | 1.000000 | True |
| 2 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Anglican Diocese Clinic | HF_00003 | False | 0 | nan | 0.000000 | False |
| 3 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Batiama Layout MCHP | HF_00004 | False | 0 | 2015-05 | 0.000000 | False |
| 4 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Bo Government Hospital | HF_00005 | True | 1 | 2015-01 | 1.000000 | True |
| 5 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Bo School Bay CHP | HF_00006 | False | 0 | 2022-01 | 0.000000 | False |
| 6 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Breakthrough MCHP | HF_00007 | False | 0 | 2023-10 | 0.000000 | False |
| 7 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Brima Town CHP | HF_00008 | True | 1 | 2015-01 | 1.000000 | True |
| 8 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | EDC Unit CHP | HF_00009 | True | 1 | 2015-01 | 1.000000 | True |
| 9 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Favour MCHP | HF_00010 | True | 1 | 2015-01 | 1.000000 | True |
To adapt the code:
- Do not modify anything in the code above
Step 3: Method 1 - Permanent activity status identification
Step 3.1: Permanent activity status identification
Building off the previous step, this code classifies facilities as active if they reported or have reported before, otherwise inactive.
df <- df |>
dplyr::mutate(
active_status_method1 = dplyr::case_when(
Facility_status == 1 ~ "Active",
Facility_status == 0.5 ~ "Active",
Facility_status == 0 ~ "Inactive",
TRUE ~ "Inactive"
)
)
cat("Method 1 (R) - Summary:\n")
cat("Total facilities:", length(unique(df$hf_uid)), "\n")
cat("Active facilities (ever reported):", length(unique(df$hf_uid[!is.na(df$first_month_reported_YM)])), "\n")
cat("Never reported facilities:", length(unique(df$hf_uid[is.na(df$first_month_reported_YM)])), "\n")# Method 1 is already implemented above as Facility_active
# This represents permanent activation after first report
print("Method 1 (Python) - Permanent Activation")Method 1 (Python) - Permanent Activation
Total facilities: 1324
print(f"Active facilities (ever reported): {len(dfr[dfr['first_month_reported'].notna()]['hf_uid'].unique())}")Active facilities (ever reported): 1127
print(f"Never reported facilities: {len(dfr[dfr['first_month_reported'].isna()]['hf_uid'].unique())}")Never reported facilities: 197
To adapt the code:
- Do not modify anything in the code above
Step 3.2: Define activity status visualization function
To simplify plotting each active status method, we define a function that generates corresponding visualizations based on defined input parameters.
Show the code
plot_facility_activity <- function(
data,
method = c("method1", "method2", "method3"),
level = c("national", "district"),
facet_col = NULL,
title = NULL,
subtitle = NULL,
plot_flips = FALSE
) {
# Map method to column name
status_col <- switch(method,
"method1" = "active_status_method1",
"method2" = "active_status_method2",
"method3" = "active_status_method3",
stop("Method must be 'method1', 'method2', or 'method3'")
)
# Method labels for titles
method_labels <- c(
"method1" = "Permanent Activation",
"method2" = "First-to-Last Report",
"method3" = "Dynamic Activation"
)
# Handle status flips for Method 3
if (plot_flips && method == "method3") {
# Identify facilities with status changes
flip_facilities <- data |>
arrange(hf_uid, date) |>
group_by(hf_uid) |>
summarise(has_flip = length(unique(.data[[status_col]])) > 1) |>
filter(has_flip) |>
pull(hf_uid)
data <- data |>
filter(hf_uid %in% flip_facilities)
flip_count <- length(flip_facilities)
subtitle <- paste("Showing", flip_count, "facilities with status flips")
}
# Set default titles if not provided
if (is.null(title)) {
title <- paste("Method", gsub("method", "", method), ":", method_labels[method])
}
if (is.null(subtitle) && !plot_flips) {
subtitle <- switch(method,
"method1" = "Facilities remain active indefinitely after first report",
"method2" = "Facilities are active between first and last report",
"method3" = "Handles temporary closures (6-month non-reporting threshold)"
)
}
# Create base plot with consistent colors
p <- ggplot(data, aes(x = date, y = reorder(hf_uid, total_reports), fill = .data[[status_col]])) +
geom_tile() +
scale_fill_manual(values = c("Active" = "pink", "Inactive" = "#47B5FF"), name = "Activity Status") +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme_minimal() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom",
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(size = 11, color = "gray40")
) +
labs(
x = "Date",
y = "Health Facilities",
title = title,
subtitle = subtitle
)
# ADD FLIP MARKERS only for Method 3 flips - EXCLUDING ACTIVATION
if (plot_flips && method == "method3") {
# Find exact flip points but exclude the first activation (inactive → active)
flip_points <- data |>
arrange(hf_uid, date) |>
group_by(hf_uid) |>
mutate(
status_change = .data[[status_col]] != lag(.data[[status_col]]),
# Identify first activation to exclude it
first_activation = min(which(.data[[status_col]] == "Active")),
flip_point = ifelse(status_change & row_number() > first_activation, as.character(date), NA)
) |>
filter(!is.na(flip_point)) |>
ungroup()
# Add points at flip locations only if there are any flips
if (nrow(flip_points) > 0) {
p <- p +
geom_point(data = flip_points,
aes(x = date, y = hf_uid),
color = "black", size = 1, shape = 21, fill = "white", stroke = 1)
}
}
# Add faceting for district level
if (level == "district" || !is.null(facet_col)) {
if (is.null(facet_col)) {
facet_col <- "adm1"
}
p <- p +
facet_wrap(as.formula(paste("~", facet_col)), scales = "free_y", ncol = 4) +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, size = 6),
strip.text = element_text(size = 8)
)
}
return(p)
}To adapt the code:
- Do not modify anything in the code above
To adapt the code:
- Do not modify anything in the code above
Step 3.4: Permanent activity visualization
The active status visualization function defined in the previous step can then be applied to method 1 results.
sntutils package
sntutils::facility_reporting_plot(
data = dhis2_hf,
hf_col = "hf_uid",
date_col = "date",
palette = "violet",
key_indicators = vars_of_interest,
facet_col = "adm2", # for the facetting
facet_ncol = 7, # the number of cols for the facetting
include_never_reported = TRUE,
target_language = "fr",
method = 1,
year_breaks = 8,
plot_path = val_plot_path,
plot_width = 12,
plot_height = 14,
plot_scale = 0.6
)To adapt the code:
- Do not modify anything in the code above
To adapt the code:
- Do not modify anything in the code above
Step 4: Method 2 - First-to-last activity status identification
Step 4.1: First-to-last activity status identification
To begin method 2 classification, we identify each health facility’s last reporting date. This is used in tandem with the previously identified first reporting date (method 1) to determine active status using method 2.
Show the code
# Method 2: Identify last reports and create active period
last_reports <- df |>
dplyr::filter(reported == 1) |>
dplyr::group_by(hf_uid) |>
dplyr::summarise(last_month_reported_YM = max(YM), .groups = "drop")
df <- df |>
dplyr::left_join(last_reports, by = "hf_uid")
# Method 2: Active only between first and last report
df <- df |>
dplyr::mutate(
Facility_status_method2 = dplyr::case_when(
is.na(first_month_reported_YM) ~ 0, # Never reported
YM >= first_month_reported_YM & YM <= last_month_reported_YM & reported == 1 ~ 1, # Active and reporting
YM >= first_month_reported_YM & YM <= last_month_reported_YM & reported == 0 ~ 0.5, # Active but not reporting
TRUE ~ 0 # Outside active period
),
Facility_active_method2 = Facility_status_method2 > 0,
active_status_method2 = dplyr::if_else(Facility_active_method2, "Active", "Inactive")
)
# More informative summary
total_facilities <- length(unique(df$hf_uid))
facilities_with_activity_period <- length(unique(df$hf_uid[!is.na(df$last_month_reported_YM)]))
never_reported <- length(unique(df$hf_uid[is.na(df$first_month_reported_YM)]))
currently_active <- df |>
dplyr::filter(YM == max(YM)) |>
dplyr::summarise(active_count = sum(active_status_method2 == "Active")) |>
dplyr::pull(active_count)
cat("Method 2 (R) - First-to-Last Report Activation\n")
cat("Facilities with defined activity period:", facilities_with_activity_period, "\n")
cat("Never reported facilities:", never_reported, "\n")
cat("Currently active facilities:", currently_active, "\n")
cat("Facilities permanently closed:", facilities_with_activity_period - currently_active, "\n")sntutils package
To adapt the code:
- Do not modify anything in the code above
To adapt the code:
- Do not modify anything in the code above
Step 4.2: First-to-last activity status visualization
We can call the active status visualization function again here to visualize method 2 facility classification.
sntutils package
sntutils::facility_reporting_plot(
data = dhis2_hf,
hf_col = "hf_uid",
date_col = "date",
palette = "violet",
key_indicators = vars_of_interest,
facet_col = "adm2", # for the facetting
facet_ncol = 7, # the number of cols for the facetting
include_never_reported = TRUE,
target_language = "fr",
method = 2,
year_breaks = 8,
plot_path = val_plot_path,
plot_width = 12,
plot_height = 14,
plot_scale = 0.6
)To adapt the code:
- Do not modify anything in the code above
To adapt the code:
- Do not modify anything in the code above
Step 5: Method 3 - Dynamic activity status identification
Step 5.1: Dynamic activity status identification
The below determines active status based on 6+ consecutive months of non-reporting between the first and last reporting dates identified previously.
Show the code
# Method 3: Calculate consecutive non-reporting months
df <- df |>
dplyr::arrange(hf_uid, YM) |>
dplyr::group_by(hf_uid) |>
dplyr::mutate(
# Calculate consecutive non-reporting counter
consecutive_non_report = {
counter <- 0
purrr::map_dbl(reported, ~{
if (.x == 1) {
counter <<- 0
} else {
counter <<- counter + 1
}
counter
})
}
) |>
dplyr::ungroup()
# Method 3: Inactive after 6+ consecutive months of non-reporting BETWEEN first and last reporting dates
df <- df |>
dplyr::mutate(
Facility_status_method3 = dplyr::case_when(
is.na(first_month_reported_YM) ~ 0, # Never reported
YM < first_month_reported_YM ~ 0, # Before first report
consecutive_non_report >= 6 & YM <= last_month_reported_YM ~ 0, # 6+ months non-reporting WITHIN active period
reported == 1 ~ 1, # Active and reporting
TRUE ~ 0.5 # Active but not reporting
),
Facility_active_method3 = Facility_status_method3 > 0,
active_status_method3 = dplyr::if_else(Facility_active_method3, "Active", "Inactive")
)
# Count facilities that change status
status_flip_facilities <- df |>
dplyr::group_by(hf_uid) |>
dplyr::summarise(
has_status_change = length(unique(active_status_method3)) > 1,
.groups = "drop"
) |>
dplyr::filter(has_status_change)
cat("Method 3 (R) - Summary:\n")
cat("Facilities that experienced 6+ months non-reporting:", length(unique(df$hf_uid[df$consecutive_non_report >= 6])), "\n")
cat("Facilities with status changes:", nrow(status_flip_facilities), "\n")To adapt the code:
- Do not modify anything in the code above
To adapt the code:
- Do not modify anything in the code above
Step 5.2: Dynamic activity status visualization
Here we call the active status plotting function again to visualize method 3 results at both the national and district level.
sntutils package
sntutils::facility_reporting_plot(
data = df,
hf_col = "hf_uid",
date_col = "date",
palette = "violet",
key_indicators = vars_of_interest,
facet_col = "adm2", # for the facetting
facet_ncol = 7, # the number of cols for the facetting
include_never_reported = TRUE,
target_language = "fr",
method = 3,
nonreport_window = 6, # Needed for method 3
year_breaks = 8,
plot_path = val_plot_path,
plot_width = 12,
plot_height = 14,
plot_scale = 0.6
)To adapt the code:
- Do not modify anything in the code above
To adapt the code:
- Do not modify anything in the code above
Step 5.3: Visualize dynamic activation flips
An additional visualization relevant to method 3 is the number of “flips” in status–that is, the number of times a facility switches from active, to inactive, to active again, etc. The defined plotting function can visualize flips too.
Step 6: Activity status method comparison
All three active status methods have now been applied. Visualizations allow us to compare these methods to better understand the nature of health facilities in the dataset and decide which method should be selected for further use.
Step 6.2: Visualize method comparison
Step 7: Export results
Finally, we export results of active status in addition to df_expected which contains the expected reports of health facilities needed for reporting rate calculations.
Show the code
# Create dftree without UIDs
cols <- c('adm0', 'adm1', 'adm2', 'adm3', 'hf', 'hf_uid')
dftree <- df |>
dplyr::select(all_of(cols)) |>
dplyr::distinct() |>
dplyr::arrange(across(all_of(cols)))
# Add Year and YM columns to main data
df_with_ym <- df |>
dplyr::mutate(
Year = lubridate::year(date),
Month = lubridate::month(date),
YM = format(date, "%Y-%m")
)
# Method 1 ONLY - Create monthly denominator for number of HFs active in each adm3
df_expected_method1 <- df_with_ym |>
dplyr::group_by(Year, YM, adm3) |>
dplyr::summarise(
denominator = sum(active_status_method1 == "Active", na.rm = TRUE),
.groups = "drop"
)
# Add parent admin units
admin_cols <- c('adm0', 'adm1', 'adm2', 'adm3')
t <- dftree[admin_cols] |> dplyr::distinct()
df_expected_method1 <- df_expected_method1 |>
dplyr::left_join(t, by = "adm3")
# Reorder columns
final_cols <- c('Year', 'YM', 'adm0', 'adm1', 'adm2', 'adm3', 'denominator')
df_expected_method1 <- df_expected_method1[final_cols] |>
dplyr::arrange(across(all_of(final_cols)))
# Save results - ONLY Method 1 for now
# write.csv(df_expected_method1, "expected_reports_method1.csv", row.names = FALSE)cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid']
dftree= dhis2_df[cols].drop_duplicates().reset_index(drop = True)
# create monthly denominator for number of HFs active in each adm2
df_expected = (dfr
.groupby(['Year', 'YM', 'adm3_uid'])[['Facility_active']].sum(min_count = 1)
.reset_index()
.rename(columns = {'Facility_active': 'denominator'}))
# add parent admin units
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid']
t = dftree[cols].drop_duplicates().reset_index(drop = True)
df_expected = df_expected.merge(t, on = 'adm3_uid', how = 'left', validate = 'm:1')
# reorder columns
cols = ['Year', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'denominator']
df_expected = df_expected[cols].sort_values(by = cols).reset_index(drop = True)
# save
# df_expected.to_csv(here('english/data_r/routine_cases', 'df_expected.csv'), index = None)
# Inspect results
df_expected.head(10).style| Year | YM | adm0 | adm0_uid | adm1 | adm1_uid | adm2 | adm2_uid | adm3 | adm3_uid | denominator | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | 21 |
| 1 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Badjia Chiefdom | adm3_00002 | 2 |
| 2 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Bagbwe Chiefdom | adm3_00003 | 6 |
| 3 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Baoma Chiefdom | adm3_00004 | 16 |
| 4 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Bargbo Chiefdom | adm3_00005 | 8 |
| 5 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Bongor Chiefdom | adm3_00006 | 4 |
| 6 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Bumpe Ngao Chiefdom | adm3_00007 | 13 |
| 7 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Gbo Chiefdom | adm3_00008 | 2 |
| 8 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Jaiama Chiefdom | adm3_00009 | 3 |
| 9 | 2015 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo District Council | adm2_00002 | Kakua Chiefdom | adm3_00010 | 8 |
Full code
Show the code
# Method 1: Permanent Activation - Complete Code
# Load required R packages
pacman::p_load(
readxl, # Read Excel files
dplyr, # Data manipulation
tidyr, # Data tidying
lubridate, # Date handling
ggplot2, # Data visualization
RColorBrewer, # Color palettes
scales, # Scale functions for ggplot2
purrr, # Functional programming
DT, # Interactive data tables
writexl,
reticulate # Export to Excel
)
# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)
# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")
# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)
# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)
# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)
# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)
# Method 1: Determine reporting expectations
df <- dplyr::mutate(df, expected_to_report_method1 = base::ifelse(base::is.na(first_month_reported), "Never reported", base::ifelse(date >= first_month_reported, "Expected to report", "Not expected to report")))
# Generate final reporting status for method 1
df <- dplyr::mutate(df, reporting_status_method1 = base::ifelse(expected_to_report_method1 == "Never reported", "Never reported", base::ifelse(expected_to_report_method1 == "Expected to report" & reported == 1, "Expected and reported", base::ifelse(expected_to_report_method1 == "Expected to report" & reported == 0, "Expected but didn't report", "Not expected to report"))))
# Create status codes for method 1
df <- dplyr::mutate(df, status_code_method1 = dplyr::case_when(reporting_status_method1 == "Never reported" ~ 0, reporting_status_method1 == "Expected and reported" ~ 1, reporting_status_method1 == "Expected but didn't report" ~ 2, reporting_status_method1 == "Not expected to report" ~ 3))
# Create active status categories for method 1
df$active_status1 <- dplyr::case_when(
df$reporting_status_method1 == "Expected and reported" ~ "Active",
df$reporting_status_method1 == "Expected but didn't report" ~ "Active",
df$reporting_status_method1 == "Never reported" ~ "Inactive",
df$reporting_status_method1 == "Not expected to report" ~ "Inactive"
)
# Create numeric codes for active status
df$active_status_code1 <- dplyr::case_when(
df$active_status1 == "Active" ~ 1,
df$active_status1 == "Inactive" ~ 0
)
# Save method 1 data to Excel
#writexl::write_xlsx(df, "active_status_method1.xlsx") #
# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)
# Define colors and labels for Method 1
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
"1" = "Expected and reported",
"2" = "Expected but didn't report",
"3" = "Not expected to report")
# Generate Overall Reporting Status Heatmap for Method 1
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
geom_raster() +
scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
theme_minimal() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
) +
labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 1")
# Generate admin level reporting status heatmap for method 1
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()
for (area in adm1_areas) {
df_filtered <- df[df$adm1 == area, ]
df_filtered <- dplyr::filter(df, adm1 == area)
p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 1 -", area))
plots_list[[area]] <- p
base::print(p)
}
# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")
ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method1))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 1 - Bo District")
# Generate Overall Active Status Heatmap for Method 1
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")
df$active_status_method1 <- dplyr::case_when(
df$reporting_status_method1 == "Expected and reported" ~ "Active",
df$reporting_status_method1 == "Expected but didn't report" ~ "Active",
df$reporting_status_method1 == "Never reported" ~ "Inactive",
df$reporting_status_method1 == "Not expected to report" ~ "Inactive"
)
ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method1)) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap")
# Generate admin level active status heatmap for method 1
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()
for (area in adm1_areas) {
df_filtered <- df[df$adm1 == area, ]
df_filtered <- dplyr::filter(df, adm1 == area)
p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method1))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 1 -", area))
plots_list[[area]] <- p
base::print(p)
}Show the code
# Method 2: Activate after first report, inactivate after last report - Complete Code
# Load required R packages
pacman::p_load(
readxl, # Read Excel files
dplyr, # Data manipulation
tidyr, # Data tidying
lubridate, # Date handling
ggplot2, # Data visualization
RColorBrewer, # Color palettes
scales, # Scale functions for ggplot2
purrr, # Functional programming
DT, # Interactive data tables
writexl,
reticulate # Export to Excel
)
# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)
# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")
# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)
# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)
# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)
# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)
# Method 2: Determine reporting expectations
df <- df |>
dplyr::mutate(
expected_to_report_method2 = ifelse(
is.na(first_month_reported),
"Never reported",
ifelse(
date >= first_month_reported & date <= last_month_reported,
"Expected to report",
"Not expected to report"
)
)
)
# Determine final reporting status method 2
df <- df |>
dplyr::mutate(
reporting_status_method2 = ifelse(
expected_to_report_method2 == "Never reported",
"Never reported",
ifelse(
expected_to_report_method2 == "Expected to report" & reported == 1,
"Expected and reported",
ifelse(
expected_to_report_method2 == "Expected to report" & reported == 0,
"Expected but didn't report",
"Not expected to report"
)
)
)
)
# Create status codes for method 2
df <- dplyr::mutate(df, status_code_method2 = dplyr::case_when(reporting_status_method2 == "Never reported" ~ 0, reporting_status_method2 == "Expected and reported" ~ 1, reporting_status_method2 == "Expected but didn't report" ~ 2, reporting_status_method2 == "Not expected to report" ~ 3))
# Create active status categories for method 2
df$active_status2 <- dplyr::case_when(
df$reporting_status_method2 == "Expected and reported" ~ "Active",
df$reporting_status_method2 == "Expected but didn't report" ~ "Active",
df$reporting_status_method2 == "Never reported" ~ "Inactive",
df$reporting_status_method2 == "Not expected to report" ~ "Inactive"
)
# Create numeric codes for active status
df$active_status_code2 <- dplyr::case_when(
df$active_status2 == "Active" ~ 1,
df$active_status2 == "Inactive" ~ 0
)
# Save method 2 data to Excel
#writexl::write_xlsx(df, "active_status_method2.xlsx")
# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)
# Define colors and labels for Method 2
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
"1" = "Expected and reported",
"2" = "Expected but didn't report",
"3" = "Not expected to report")
# Generate Overall Reporting Status Heatmap for Method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
geom_raster() +
scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
theme_minimal() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
) +
labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 2")
# Generate admin level reporting status heatmap for method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()
for (area in adm1_areas) {
df_filtered <- df[df$adm1 == area, ]
df_filtered <- dplyr::filter(df, adm1 == area)
p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 2 -", area))
plots_list[[area]] <- p
base::print(p)
}
# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")
ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method2))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 2 - Bo District")
# Generate Overall Active Status Heatmap for Method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")
df$active_status_method2 <- dplyr::case_when(
df$reporting_status_method2 == "Expected and reported" ~ "Active",
df$reporting_status_method2 == "Expected but didn't report" ~ "Active",
df$reporting_status_method2 == "Never reported" ~ "Inactive",
df$reporting_status_method2 == "Not expected to report" ~ "Inactive"
)
ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method2)) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 2")
# Generate admin level active status heatmap for method 2
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()
for (area in adm1_areas) {
df_filtered <- df[df$adm1 == area, ]
df_filtered <- dplyr::filter(df, adm1 == area)
p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method2))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 2 -", area))
plots_list[[area]] <- p
base::print(p)
}# Method 3: Dynamic activation and inactivation - Complete Code
# Load required R packages
pacman::p_load(
readxl, # Read Excel files
dplyr, # Data manipulation
tidyr, # Data tidying
lubridate, # Date handling
ggplot2, # Data visualization
RColorBrewer, # Color palettes
scales, # Scale functions for ggplot2
purrr, # Functional programming
DT, # Interactive data tables
writexl,
reticulate # Export to Excel
)
# Import dataset
data_filepath <- here::here("english/data_r/routine_cases/clean_malaria_routine_data_final.rds")
df <- readRDS(data_filepath)
# Configure reporting indicators
report_cols <- c("allout", "susp", "test", "conf", "maltreat")
# Calculate monthly reporting status
df_selected <- dplyr::select(df, dplyr::all_of(report_cols))
row_sums <- base::rowSums(df_selected, na.rm = TRUE)
df$reported <- base::ifelse(row_sums > 0, 1, 0)
# Create total reports per facility for proper ordering
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, total_reports = base::sum(reported, na.rm = TRUE))
df <- dplyr::ungroup(df)
# Identify the first hf reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, first_month_reported = base::ifelse(base::any(reported == 1), base::min(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)
# Identify the hf last reporting date
df <- dplyr::group_by(df, hf_uid)
df <- dplyr::mutate(df, last_month_reported = base::ifelse(base::any(reported == 1), base::max(date[reported == 1], na.rm = TRUE), NA))
df <- dplyr::ungroup(df)
# Method 3: Determine active and inactive hf
df <- df |>
dplyr::arrange(hf_uid, date) |>
dplyr::group_by(hf_uid) |>
dplyr::mutate(
# Create a logical vector indicating runs of zeros >= 6
zero_run = {
# rle() computes lengths and values of consecutive identical elements
r <- base::rle(reported == 0)
# Identify which runs are zeros AND have length >= 6
run_flag <- r$values & r$lengths >= 6
# Repeat the TRUE/FALSE flags for all months in the run
base::rep(run_flag, r$lengths)
},
# Assign status based on zero_run
status_method3 = base::ifelse(zero_run, "Inactive", "Active")
) |>
dplyr::ungroup()
# Determine reporting status method 3
df <- df |>
dplyr::mutate(
expected_to_report_method3 = ifelse(
is.na(first_month_reported),
"Never reported",
ifelse(
status_method3 == "Active",
"Expected to report",
"Not expected to report"
)
)
)
# Determine final reporting status method 3
df <- df |>
dplyr::mutate(
reporting_status_method3 = ifelse(
expected_to_report_method3 == "Never reported",
"Never reported",
ifelse(
expected_to_report_method3 == "Expected to report" & reported == 1,
"Expected and reported",
ifelse(
expected_to_report_method3 == "Expected to report" & reported == 0,
"Expected but didn't report",
"Not expected to report"
)
)
)
)
# Create status codes for method 3
df <- dplyr::mutate(df, status_code_method3 = dplyr::case_when(reporting_status_method3 == "Never reported" ~ 0, reporting_status_method3 == "Expected and reported" ~ 1, reporting_status_method3 == "Expected but didn't report" ~ 2, reporting_status_method3 == "Not expected to report" ~ 3))
# Create active status categories for method 3
df$active_status3 <- dplyr::case_when(
df$reporting_status_method3 == "Expected and reported" ~ "Active",
df$reporting_status_method3 == "Expected but didn't report" ~ "Active",
df$reporting_status_method3 == "Never reported" ~ "Inactive",
df$reporting_status_method3 == "Not expected to report" ~ "Inactive"
)
# Create numeric codes for active status
df$active_status_code3 <- dplyr::case_when(
df$active_status3 == "Active" ~ 1,
df$active_status3 == "Inactive" ~ 0
)
# Save method 3 data to Excel
#writexl::write_xlsx(df, "active_status_method3.xlsx")
# Set figure size for console display
options(repr.plot.width = 15, repr.plot.height = 15)
# Define colors and labels for Method 3
colors <- c("0" = "gray", "1" = "blue", "2" = "red", "3" = "yellow")
labels <- c("0" = "Never reported",
"1" = "Expected and reported",
"2" = "Expected but didn't report",
"3" = "Not expected to report")
# Generate Overall Reporting Status Heatmap for Method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
ggplot(df, aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
geom_raster() +
scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
theme_minimal() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x = element_text(size = 6, angle = 90, hjust = 1)
) +
labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap - Method 3")
# Generate admin level reporting status heatmap for method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()
for (area in adm1_areas) {
df_filtered <- df[df$adm1 == area, ]
df_filtered <- dplyr::filter(df, adm1 == area)
p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = base::paste("Reporting Status Heatmap Method 3 -", area))
plots_list[[area]] <- p
base::print(p)
}
# Create reporting status heatmap for specific sub category in admin unit
base::options(repr.plot.width = 15, repr.plot.height = 15)
df_filtered <- dplyr::filter(df, adm1 == "Bo District")
ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(status_code_method3))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 3 - Bo District")
# Generate Overall Active Status Heatmap for Method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
colors <- c("Active" = "#47B5FF", "Inactive" = "pink")
labels <- c("Active" = "Active", "Inactive" = "Inactive")
df$active_status_method3 <- dplyr::case_when(
df$reporting_status_method3 == "Expected and reported" ~ "Active",
df$reporting_status_method3 == "Expected but didn't report" ~ "Active",
df$reporting_status_method3 == "Never reported" ~ "Inactive",
df$reporting_status_method3 == "Not expected to report" ~ "Inactive"
)
ggplot2::ggplot(df, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = active_status_method3)) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Active Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = "Reporting Status Heatmap Method 3")
# Generate admin level active status heatmap for method 3
base::options(repr.plot.width = 15, repr.plot.height = 15)
adm1_areas <- base::unique(df$adm1)
plots_list <- base::list()
for (area in adm1_areas) {
df_filtered <- df[df$adm1 == area, ]
df_filtered <- dplyr::filter(df, adm1 == area)
p <- ggplot2::ggplot(df_filtered, ggplot2::aes(x = date, y = reorder(hf_uid, total_reports), fill = factor(active_status_method3))) +
ggplot2::geom_raster() +
ggplot2::scale_fill_manual(values = colors, labels = labels, name = "Status", na.value = "white") +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text.y = ggplot2::element_blank(),
axis.ticks.y = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(size = 6, angle = 90, hjust = 1)
) +
ggplot2::labs(x = "Date", y = "Health Facilities", title = paste("Active Status Heatmap Method 3 -", area))
plots_list[[area]] <- p
base::print(p)
}Step 1.2: Load and prepare data
Now we import the DHIS2 dataset that was initially processed in the DHIS2 Data Preprocessing section of this code library.
Step 1.3: Determine reporting status
ADD
Val’s text: Here we create an intermediate dataframe storing the monthly reporting status of each Health Facility.
Show the code
key_indicators = ['allout', 'test', 'pres', 'conf', 'maltreat', 'maladm']
# make a copy of the data
dfr = dhis2_df.copy()
# add a column indicating whether the HF reported on any of the key indicators
dfr.insert(len(dfr.columns), 'key_variables', dfr[key_indicators].notna().any(axis = 1))
dfr.insert(len(dfr.columns), 'reported', np.where(dfr['key_variables'], 1, 0))
# drop unecessary columns = when consulted with team, Val to add normalised adm names functions and dftree to streamline these operations
cols = ['Year', 'Month', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid', 'key_variables', 'reported']
dfr = dfr[cols]
# compute first month reported for each HF and add column in dfr
t = dfr[dfr['reported'] == 1].groupby('hf_uid')['YM'].min().to_frame(name = 'first_month_reported').reset_index()
# make sure to keep all HFs in case some don't have a valid first month (never reported on anything)
temp = pd.DataFrame(dfr['hf_uid'].unique(), columns = ['hf_uid'])
t = temp.merge(t, on = 'hf_uid', how = 'left', validate = '1:1')
dfr = dfr.merge(t, on = 'hf_uid', how = 'left', validate = 'm:1')
# add HF status column:
# 0: not active
# 0.5: HF didn't report when considered active
# 1: active and reporting
dfr.insert(len(dfr.columns),
'Facility_status',
np.where(dfr['reported'] == 1, 1, np.where((dfr['reported'] == 0) & (dfr['YM'] >= dfr['first_month_reported']), 0.5, 0)))
# add active HF column
dfr.insert(len(dfr.columns), 'Facility_active', np.where(dfr['Facility_status'] == 0, False, True))
# quick visual check
dfr.head(10).style| Year | Month | YM | adm0 | adm0_uid | adm1 | adm1_uid | adm2 | adm2_uid | adm3 | adm3_uid | hf | hf_uid | key_variables | reported | first_month_reported | Facility_status | Facility_active | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Aethel CHP | HF_00001 | False | 0 | 2019-01 | 0.000000 | False |
| 1 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Agape Way CHP | HF_00002 | True | 1 | 2015-01 | 1.000000 | True |
| 2 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Anglican Diocese Clinic | HF_00003 | False | 0 | nan | 0.000000 | False |
| 3 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Batiama Layout MCHP | HF_00004 | False | 0 | 2015-05 | 0.000000 | False |
| 4 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Bo Government Hospital | HF_00005 | True | 1 | 2015-01 | 1.000000 | True |
| 5 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Bo School Bay CHP | HF_00006 | False | 0 | 2022-01 | 0.000000 | False |
| 6 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Breakthrough MCHP | HF_00007 | False | 0 | 2023-10 | 0.000000 | False |
| 7 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Brima Town CHP | HF_00008 | True | 1 | 2015-01 | 1.000000 | True |
| 8 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | EDC Unit CHP | HF_00009 | True | 1 | 2015-01 | 1.000000 | True |
| 9 | 2015 | 1 | 2015-01 | Sierra Leone | adm0_00001 | Bo District | adm1_00001 | Bo City Council | adm2_00001 | Bo City | adm3_00001 | Favour MCHP | HF_00010 | True | 1 | 2015-01 | 1.000000 | True |
Step 1.3.a Visualize reporting rate
Show the code
# SS: added to resolve render error
# Plot a visual of monthly HF-level reporting status for the country
df = (dfr.pivot(index = ['hf_uid', 'first_month_reported'], columns = 'YM', values = 'Facility_status')
.sort_values(by = 'first_month_reported'))
df = df.reset_index().drop(['first_month_reported'], axis = 1).set_index(['hf_uid'])
# Prep colours and labels for cmap and legend
colours = [status_params_dict[i]['colour'] for i in sorted(status_params_dict.keys())]
labels = [status_params_dict[i]['label'] for i in sorted(status_params_dict.keys())]
cmap = ListedColormap(colours)
# Make figure
fs = 15
fig, ax = plt.subplots(figsize = (15, 10))
sns.heatmap(ax = ax, data = df, cmap = cmap, cbar = None)
ax.set_xlabel('')
ax.set_xticks(ax.get_xticks())
ax.set_xticklabels([l.get_text()[0:7] for l in ax.get_xticklabels()], rotation = 45, ha = 'right')
ax.set_yticks([])[]
Show the code
To adapt the code:
Step 2: Determine active and inactive status
Method 1: First Report Activation
I CAN’T REALLY TELL WHAT IS GOING ON IN THE REMAINING CODE HERE. I REMOVED THE UID AND YYYY-MM SINCE THAT SHOULD BE TAKEN CARE OF IN THE DATA PREPROCESSING! THE DATASET LOADED IN 1.2 SHOULD ALREADY INCLUDE THOSE.
Step 2 Method 2:
ADD
Step 2 Method 3:
ADD
Step 2 Method Val:
Prepare your exptected reports dataframe - df_expected
VT I would suggest dissociating between outpatient and inpatient indicators here, I normally do it. Don’t wan’t to modify the structure too much before discussing with team
Here we build a dataframe storing the number of active Health Facilities for each month of the period studied. This dataframe will be useful in subsequent sections (link to RR and Incidence adjustment sections).
Show the code
# create dftree
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'hf', 'hf_uid']
dftree= dhis2_df[cols].drop_duplicates().reset_index(drop = True)
# create monthly denominator for number of HFs active in each adm2
df_expected = (dfr
.groupby(['Year', 'YM', 'adm3_uid'])[['Facility_active']].sum(min_count = 1)
.reset_index()
.rename(columns = {'Facility_active': 'denominator'}))
# add parent admin units
cols = ['adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid']
t = dftree[cols].drop_duplicates().reset_index(drop = True)
df_expected = df_expected.merge(t, on = 'adm3_uid', how = 'left', validate = 'm:1')
# reorder columns
cols = ['Year', 'YM', 'adm0', 'adm0_uid', 'adm1', 'adm1_uid', 'adm2', 'adm2_uid', 'adm3', 'adm3_uid', 'denominator']
df_expected = df_expected[cols].sort_values(by = cols).reset_index(drop = True)
# save
df_expected.to_csv(here('english/data_r/routine_cases', 'df_expected.csv'), index = None)
# Inspect results
df_expected.head(10).styleTo adapt the code:
Step 3: Assign expected and observed reporting status accounting for active/inactive
Step 3.1: Create summary statistics
Step 3.2: Create detailed reporting status
Step 3.3: Assign final status with priority
Step 3.4: Sort and prepare data for visualization
Step 4: Visualise processed data
Step 4.1: Set up data
Step 4.2: Make heatmap
Step 4.3: Make number by time
Step 5: Save data
ADD
ADD
Full code
Find the full code scripts for determining active and inactive status of health facilities below.