cli::cli_h1(
"Setup parameters"
)
## ----------------------------------------------------------- ##
# Analysis Parameters --------------------------------------- ##
## ----------------------------------------------------------- ##
# Set paths
paths <- sntutils::setup_project_paths()
# Set Country iso
iso3 <- "gha" #<----------- Change to your country ISO3 code
adm0_name <- "Ghana" #<----------- Change to your country name
# What variable are you analysing
variable <- "rainfall" #<----------- Switch between "rainfall" and "case"Durations of Seasonality
Overview
Once an area has been identified as seasonal using the WHO 4-month rule, the next analytical step is to determine when SMC cycles should begin and how many cycles are needed to adequately cover the transmission season.
This analysis addresses two practical planning questions:
- What is the optimal start month for SMC? - i.e, which month consistently marks the beginning of the peak transmission window?
- What is the minimum number of monthly cycles needed to cover at least 60% of annual cases or rainfall? - i.e, do districts need 2, 3, 4 or 5 cycles?
The analysis uses a rolling block approach, examining every possible consecutive window of 2, 3, 4 and 5 months between April and December to identify which window captures the greatest concentration of malaria cases or rainfall. Results are summarised across years to determine the most consistently optimal block for each district.
- Identify the month-block (2-, 3-, 4-, 5-month window) that captures the highest proportion of annual malaria cases or rainfall for each district
- Determine the optimal start month for SMC based on consistent historical patterns
- Identify the minimum number of cycles required to cover >= 60% of cases or rainfall
- Produce district-level summary tables and maps for program planning
Conceptual Background
Block Window
Rather than assuming a fixed transmission season, this method tests all plausible SMC-relevant windows, starting from April and September, at four duration lengths; 2 months, 3 months, 4 months, 5 months. For each district and year, the method identifies which window of each duration captures the greatest share of annual cases or rainfall.
For example, for a 3-month block:
| Block | Months Covered |
|---|---|
| apr-may-jun | April, May, June |
| may-jun-jul | May, June, July |
| jun-jul-aug | June, July, August |
| jul-aug-sep | July, August, September |
| aug-sep-oct | August, September, October |
| sep-oct-nov | September, October, November |
The same logic applies for the rest of the month blocks.
Identifying the Optimal Block
For each district-year combination, the block with the highest sum of cases or rainfall within each duration class is identified. The percentage of annual total captured by that block is recorded. Across multiple years, the block that appears most frequently as the optimal window is selected, and the median proportion it captures is used to assess adequacy.
Minimum Cycle
The >= 60% threshold from WHO guidance is applied again here, but now to determine how many cycles are needed:
- If a 3-month block already captures >= 60% of the annual total, 3 cycles may be sufficient.
- If a 3 months are insufficient but a 4-month block reaches 60%, 4 cycles are recommended.
- If >= 60% is only achievable with a 5-month window, 5 cycles are recommended.
Where no window reaches 60%, the block with the highest coverage across durations is retained as the best available option.
These results inform, but do not replace, national program decision. Before finalising SMC cycle counts and start month:
- Validate findings against operational feasibility and supply chain constraints
- Confirm that the 60% threshold is appropriate for the national context, for example if the country experiences heavier rains throughout the year, can adjust to 70%
- Cross-reference case-based and rainfall-based results for consistency
- Document and justify any deviation from the analysis outputs
Analytical Workflow
This analysis involves two sequential scripts:
| Script | Purpose |
|---|---|
| Script 1 - Block Analysis | Calculates rolling block percentages for each district-year and produces frequency summaries |
| Script 2 - Seasonality Mapping | Applies the minimum cycle rule, selects optimal blocks, and produces maps |
Both scripts can run on either case data or rainfall data by switching a single parameter. It is recommended to run the analysis on both and compare outputs.
Script 1: Monthly Block Analysis
Overview
This script loads either malaria case or rainfall data, computes rolling 2-, 3-, 4-, and 5-month block sums for each district and year, and produces three outputs:
- A summary table with the maximum block percentage per district per year
- A detailed table with all block percentages per district per year
- A frequency analysis table identifying which block is optimal most consistently across years
Step 1: Set Parameters
Before running the script, configure the country and variable of interest. The ‘variable’ parameter controls whether the script analyses rainfall or cases, switch this, then rerun the script to completion.
To adapt the code:
- Line 4: Replace ‘“gha”’ with your country’s ISO3 code (e.g., ‘“gin”’ for Guinea)
- Line 5: Update the country name accordingly
- Line 8: Set to ‘“rainfall”’ to analyse rainfall data, or ‘“case”’ to analyse case data.
Step 2: Load Data
Load either rainfall or case data depending on your ‘variable’ setting. Then assign the chosen dataset to a common ‘df’ object before proceeding, this is the only switch required before running the remainder of the script.
cli::cli_h2("Load rainfall Data")
rainfall_data <- sntutils::read_snt_data(
here::here(paths$climate, "processed"),
glue::glue("{iso3}_rainfall_processed"),
"xlsx"
) |>
dplyr::group_by(adm1, adm2, year, month) |>
dplyr::summarise(value = sum(mean_rainfall_mm), .groups = "drop")
cli::cli_h2("Load case Data")
case_data <- sntutils::read_snt_data(
here::here(paths$dhis2, "processed"),
glue::glue("{iso3}_dhis2_processed"),
"xlsx"
) |>
dplyr::filter(dplyr::if_all(c(conf, conf_ov5, conf_u5, conf_preg), ~ !is.na(.))) |>
dplyr::select(adm0, adm1, adm2, year, month, conf, conf_ov5, conf_u5, conf_preg, date) |>
dplyr::group_by(adm1, adm2, year, month) |>
dplyr::summarise(value = sum(conf_ov5), .groups = "drop") # <-------------- switch variable depending on what analysing
# Switch here
df <- rainfall_data #<----------- Switch between rainfall_data and case_dataTo adapt the code:
- Line 21: If analyzing cases, change ‘sum(conf)’ to ‘sum(conf_u5)’ or another case variable as appropriate for your analysis objective (e.g, confirmed cases in pregnant women)
- Line 24: Switch df to point to ‘case_data’ when running the case analysis
Step 3: Prepare Data
Create a district identifier combining admin level 1 and admin level 2, and inspect the data to confirm the years and number of districts present.
cli::cli_h2("Prepare data")
# Create district identifier
df <- df |>
dplyr::mutate(district = paste(adm1, adm2, sep = " - "))
# Get unique years and districts
years <- sort(unique(df$year))
districts <- sort(unique(df$district))
cli::cli_alert_info(glue::glue("Years in data: {paste(years, collapse = ', ')}"))
cli::cli_alert_info(glue::glue("Number of districts: {length(districts)}"))
cli::cli_alert_info(glue::glue("Date range: {min(df$year)} - {max(df$year)}"))To adapt the code:
- Do not change anything in the code above, if your district names are unclear in outputs, verify that ‘adm1’ and ‘adm2’ are correctly populated in your processed data.
Step 4: Calculate Rolling Block Percentages
This is the core computation. For each district and year, the script:
- Retrieves the monthly values and calculates the annual total
- Computes the sum for each 2-, 3-, 4- and 5-month window starting from April
- Records the maximum block sum and the corresponding block label for each duration
- Expresses each block as a percentage of the annual total
A helper function converts start month and block length into a readable label.
get_month_label <- function(start_month, block_length) {
month_abbr <- c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug",
"sep", "oct", "nov", "dec")
months_in_block <- start_month:(start_month + block_length - 1)
paste(month_abbr[months_in_block], collapse = "-")
}
# ============================================================================
# 2. CALCULATE ROLLING BLOCK PERCENTAGES
# ============================================================================
cli::cli_h2(
"Calculate rolling block percentages"
)
# Initialize results dataframe
summary_results <- data.frame()
detailed_results <- data.frame()
# Process each year
for (yr in years) {
year_data <- df |> dplyr::filter(year == yr)
# Process each district
for (dist in districts) {
district_year_data <- year_data |> dplyr::filter(district == dist)
if (nrow(district_year_data) == 0) next
monthly_vals <- district_year_data |>
dplyr::select(month, value) |>
dplyr::arrange(month)
total_value <- sum(monthly_vals$value, na.rm = TRUE)
if (total_value == 0) next
month_lookup <- stats::setNames(monthly_vals$value, monthly_vals$month)
# Calculate rolling sums for 2-month blocks (Apr-May through Oct-Nov) #####
blocks_2m <- list()
blocks_2m_values <- c()
for (start_month in 4:10) {
block_name <- get_month_label(start_month, 2)
block_months <- start_month:(start_month + 1)
block_sum <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
blocks_2m[[block_name]] <- block_sum
blocks_2m_values <- c(blocks_2m_values, block_sum)
}
max_2m <- max(blocks_2m_values)
max_2m_block <- names(which(unlist(blocks_2m) == max_2m))[1]
# Calculate rolling sums for 3-month blocks (Apr-Jun through Sep-Nov) #####
blocks_3m <- list()
blocks_3m_values <- c()
for (start_month in 4:9) {
block_name <- get_month_label(start_month, 3)
block_months <- start_month:(start_month + 2)
block_sum <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
blocks_3m[[block_name]] <- block_sum
blocks_3m_values <- c(blocks_3m_values, block_sum)
}
max_3m <- max(blocks_3m_values)
max_3m_block <- names(which(unlist(blocks_3m) == max_3m))[1]
# Calculate rolling sums for 4-month blocks (Apr-Jul through Sep-Dec) ####
blocks_4m <- list()
blocks_4m_values <- c()
for (start_month in 4:9) {
block_name <- get_month_label(start_month, 4)
block_months <- start_month:(start_month + 3)
block_sum <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
blocks_4m[[block_name]] <- block_sum
blocks_4m_values <- c(blocks_4m_values, block_sum)
}
max_4m <- max(blocks_4m_values)
max_4m_block <- names(which(unlist(blocks_4m) == max_4m))[1]
# Calculate rolling sums for 5-month blocks (Apr-Aug through Aug-Dec) ####
blocks_5m <- list()
blocks_5m_values <- c()
for (start_month in 4:8) {
block_name <- get_month_label(start_month, 5)
block_months <- start_month:(start_month + 4)
block_sum <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
blocks_5m[[block_name]] <- block_sum
blocks_5m_values <- c(blocks_5m_values, block_sum)
}
max_5m <- max(blocks_5m_values)
max_5m_block <- names(which(unlist(blocks_5m) == max_5m))[1]
# ========================================================================
# CREATE SUMMARY OUTPUT ROW
# ========================================================================
summary_row <- data.frame(
year = yr,
district = dist,
total_value = total_value,
max_2m = max_2m,
max_3m = max_3m,
max_4m = max_4m,
max_5m = max_5m,
pct_2m = (max_2m / total_value) * 100,
pct_3m = (max_3m / total_value) * 100,
pct_4m = (max_4m / total_value) * 100,
pct_5m = (max_5m / total_value) * 100
)
summary_results <- rbind(summary_results, summary_row)
# ========================================================================
# CREATE DETAILED OUTPUT ROW
# ========================================================================
detailed_row <- data.frame(
district = dist,
years = yr,
stringsAsFactors = FALSE
)
for (block_name in names(blocks_2m)) {
detailed_row[[block_name]] <- (blocks_2m[[block_name]] / total_value) * 100
}
for (block_name in names(blocks_3m)) {
detailed_row[[block_name]] <- (blocks_3m[[block_name]] / total_value) * 100
}
for (block_name in names(blocks_4m)) {
detailed_row[[block_name]] <- (blocks_4m[[block_name]] / total_value) * 100
}
for (block_name in names(blocks_5m)) {
detailed_row[[block_name]] <- (blocks_5m[[block_name]] / total_value) * 100
}
detailed_row$max_2m <- (max_2m / total_value) * 100
detailed_row$max_3m <- (max_3m / total_value) * 100
detailed_row$max_4m <- (max_4m / total_value) * 100
detailed_row$max_5m <- (max_5m / total_value) * 100
detailed_row$max_2m_block <- max_2m_block
detailed_row$max_3m_block <- max_3m_block
detailed_row$max_4m_block <- max_4m_block
detailed_row$max_5m_block <- max_5m_block
detailed_results <- rbind(detailed_results, detailed_row)
}
}
cli::cli_alert_success(
glue::glue(
"Analysis complete! Summary: {nrow(summary_results)} records | Detailed: {nrow(detailed_results)} records"
)
)To adapt the code:
- Do not change anything in the code above. The block windows (April-September start months) are chosen to reflect the SMC-relevant transmission season; consult the SNT team before modifying these bounds based on special rainfall seasons for instance.
Step 5: Reorder columns for detailed output
# ============================================================================
# 3. REORDER COLUMNS FOR DETAILED OUTPUT
# ============================================================================
col_order <- c(
"years", "district",
# 2-month blocks
"apr-may", "may-jun", "jun-jul", "jul-aug", "aug-sep", "sep-oct", "oct-nov",
# 3-month blocks
"apr-may-jun", "may-jun-jul", "jun-jul-aug", "jul-aug-sep", "aug-sep-oct", "sep-oct-nov",
# 4-month blocks
"apr-may-jun-jul", "may-jun-jul-aug", "jun-jul-aug-sep", "jul-aug-sep-oct", "aug-sep-oct-nov", "sep-oct-nov-dec",
# 5-month blocks
"apr-may-jun-jul-aug", "may-jun-jul-aug-sep", "jun-jul-aug-sep-oct", "jul-aug-sep-oct-nov", "aug-sep-oct-nov-dec",
# Max values
"max_2m", "max_3m", "max_4m", "max_5m",
# Max block identifiers
"max_2m_block", "max_3m_block", "max_4m_block", "max_5m_block"
)Step 6: Save Block Analysis Output
Save both the summary and detailed results to Excel for review and quality assurance before proceeding to the frequency analysis.
# ============================================================================
# 4. SAVE OUTPUTS
# ============================================================================
cli::cli_h2(
"Save outputs"
)
sntutils::write_snt_data(
summary_results,
here::here(paths$val_tbl, variable),
glue::glue("{iso3}_malaria_{variable}_block_analysis"),
"xlsx"
)
cli::cli_alert_success(
glue::glue("Saved: {iso3}_malaria_{variable}_block_analysis.xlsx")
)
sntutils::write_snt_data(
detailed_results,
here::here(paths$val_tbl, variable),
glue::glue("{iso3}_malaria_detailed_yearly_{variable}_block_analysis"),
"xlsx"
)
cli::cli_alert_success(
glue::glue("Saved: {iso3}_malaria_detailed_yearly_{variable}_block_analysis.xlsx")
)Expected Outputs: - ’{iso3}malaria_{variable}_block_analysis.xlsx’ - one row per district per year, showing the mmaximum block percentage for 2-, 3-, 4-, 5-month windows - ’{iso3}malaria_detailed_yearly{variable}_block_analysis.xlsx’ - one row per district per year, with all individual block percentages and the best block label for each duration
Step 7: Frequency Analysis
This step aggregates the results across years to determine, for each district, which block is most frequently the optimal window at each duration. It also records the median proportion captured by that block in the years it was dominant, this is the figure used later to determine whether the 60% threshold is met.
# ============================================================================
# 5. GENERATE FREQUENCY ANALYSIS
# ============================================================================
cli::cli_h2(
"Generate frequency analysis"
)
frequency_results <- data.frame()
districts_list <- unique(detailed_results$district)
for (dist in districts_list) {
district_detailed <- detailed_results |> dplyr::filter(district == dist)
district_summary <- summary_results |> dplyr::filter(district == dist)
all_years <- sort(unique(district_detailed$years))
total_years <- length(all_years)
# 2m
block_2m_freq <- table(district_detailed$max_2m_block)
for (block in names(block_2m_freq)) {
block_years <- district_detailed |>
dplyr::filter(max_2m_block == block) |>
dplyr::pull(years)
freq_pct <- (length(block_years) / total_years) * 100
median_prop <- district_summary |>
dplyr::filter(year %in% block_years) |>
dplyr::pull(pct_2m) |>
stats::median()
frequency_results <- rbind(frequency_results, data.frame(
district = dist,
duration = 2,
block = block,
block_freq = round(freq_pct, 2),
freq_numeric = freq_pct,
years = paste(block_years, collapse = ", "),
median_max_prop = round(median_prop, 2),
stringsAsFactors = FALSE
))
}
# 3m
block_3m_freq <- table(district_detailed$max_3m_block)
for (block in names(block_3m_freq)) {
block_years <- district_detailed |>
dplyr::filter(max_3m_block == block) |>
dplyr::pull(years)
freq_pct <- (length(block_years) / total_years) * 100
median_prop <- district_summary |>
dplyr::filter(year %in% block_years) |>
dplyr::pull(pct_3m) |>
stats::median()
frequency_results <- rbind(frequency_results, data.frame(
district = dist,
duration = 3,
block = block,
block_freq = round(freq_pct, 2),
freq_numeric = freq_pct,
years = paste(block_years, collapse = ", "),
median_max_prop = round(median_prop, 2),
stringsAsFactors = FALSE
))
}
# 4m
block_4m_freq <- table(district_detailed$max_4m_block)
for (block in names(block_4m_freq)) {
block_years <- district_detailed |>
dplyr::filter(max_4m_block == block) |>
dplyr::pull(years)
freq_pct <- (length(block_years) / total_years) * 100
median_prop <- district_summary |>
dplyr::filter(year %in% block_years) |>
dplyr::pull(pct_4m) |>
stats::median()
frequency_results <- rbind(frequency_results, data.frame(
district = dist,
duration = 4,
block = block,
block_freq = round(freq_pct, 2),
freq_numeric = freq_pct,
years = paste(block_years, collapse = ", "),
median_max_prop = round(median_prop, 2),
stringsAsFactors = FALSE
))
}
# 5m
block_5m_freq <- table(district_detailed$max_5m_block)
for (block in names(block_5m_freq)) {
block_years <- district_detailed |>
dplyr::filter(max_5m_block == block) |>
dplyr::pull(years)
freq_pct <- (length(block_years) / total_years) * 100
median_prop <- district_summary |>
dplyr::filter(year %in% block_years) |>
dplyr::pull(pct_5m) |>
stats::median()
frequency_results <- rbind(frequency_results, data.frame(
district = dist,
duration = 5,
block = block,
block_freq = round(freq_pct, 2),
freq_numeric = freq_pct,
years = paste(block_years, collapse = ", "),
median_max_prop = round(median_prop, 2),
stringsAsFactors = FALSE
))
}
}
frequency_results <- frequency_results |>
dplyr::arrange(district, duration, dplyr::desc(freq_numeric)) |>
dplyr::select(-freq_numeric)
cli::cli_alert_success(
glue::glue("Frequency analysis complete! Total records: {nrow(frequency_results)}")
)
sntutils::write_snt_data(
obj = frequency_results,
path = here::here(paths$val_tbl, variable),
data_name = glue::glue("{iso3}_malaria_{variable}_block_frequency_analysis"),
file_formats = "xlsx"
)
cli::cli_alert_success(
glue::glue("Saved: {iso3}_malaria_{variable}_block_frequency_analysis.xlsx")
)Expected Outputs: - ’{iso3}malaria_{variable}_block_frequency_analysis’ - one row per district-duration-block combination, with the frequency (% of years that block was dominant) and median proportion of annual total captured.
Key columns to review
| Column | Description |
|---|---|
| ‘district’ | Admin 1 - Admin 2 label |
| ‘duration’ | Block length in months |
| ‘block’ | Block label |
| ‘block_freq’ | % of years in which this block was the dominant window |
| ‘years’ | Specific years in which this block was dominant |
| ‘median_max_prop’ | Median % of annual total captured in dominant years |