Durations of Seasonality

Compare 2-, 3-, 4-, and 5-month block durations to identify the best SMC start month.

Overview

Once an area has been identified as seasonal using the WHO 4-month rule, the next analytical step is to determine when SMC cycles should begin and how many cycles are needed to adequately cover the transmission season.

This analysis addresses two practical planning questions:

What is the optimal start month for SMC? - i.e, which month consistently marks the beginning of the peak transmission window?
What is the minimum number of monthly cycles needed to cover at least 60% of annual cases or rainfall? - i.e, do districts need 2, 3, 4 or 5 cycles?

The analysis uses a rolling block approach, examining every possible consecutive window of 2, 3, 4 and 5 months between April and December to identify which window captures the greatest concentration of malaria cases or rainfall. Results are summarised across years to determine the most consistently optimal block for each district.

Objectives

Identify the month-block (2-, 3-, 4-, 5-month window) that captures the highest proportion of annual malaria cases or rainfall for each district
Determine the optimal start month for SMC based on consistent historical patterns
Identify the minimum number of cycles required to cover >= 60% of cases or rainfall
Produce district-level summary tables and maps for program planning

Conceptual Background

Block Window

Rather than assuming a fixed transmission season, this method tests all plausible SMC-relevant windows, starting from April and September, at four duration lengths; 2 months, 3 months, 4 months, 5 months. For each district and year, the method identifies which window of each duration captures the greatest share of annual cases or rainfall.

For example, for a 3-month block:

Block	Months Covered
apr-may-jun	April, May, June
may-jun-jul	May, June, July
jun-jul-aug	June, July, August
jul-aug-sep	July, August, September
aug-sep-oct	August, September, October
sep-oct-nov	September, October, November

The same logic applies for the rest of the month blocks.

Identifying the Optimal Block

For each district-year combination, the block with the highest sum of cases or rainfall within each duration class is identified. The percentage of annual total captured by that block is recorded. Across multiple years, the block that appears most frequently as the optimal window is selected, and the median proportion it captures is used to assess adequacy.

Minimum Cycle

The >= 60% threshold from WHO guidance is applied again here, but now to determine how many cycles are needed:

If a 3-month block already captures >= 60% of the annual total, 3 cycles may be sufficient.
If a 3 months are insufficient but a 4-month block reaches 60%, 4 cycles are recommended.
If >= 60% is only achievable with a 5-month window, 5 cycles are recommended.

Where no window reaches 60%, the block with the highest coverage across durations is retained as the best available option.

Consult the SNT Team

These results inform, but do not replace, national program decision. Before finalising SMC cycle counts and start month:

Validate findings against operational feasibility and supply chain constraints
Confirm that the 60% threshold is appropriate for the national context, for example if the country experiences heavier rains throughout the year, can adjust to 70%
Cross-reference case-based and rainfall-based results for consistency
Document and justify any deviation from the analysis outputs

Analytical Workflow

This analysis involves two sequential scripts:

Script	Purpose
Script 1 - Block Analysis	Calculates rolling block percentages for each district-year and produces frequency summaries
Script 2 - Seasonality Mapping	Applies the minimum cycle rule, selects optimal blocks, and produces maps

Both scripts can run on either case data or rainfall data by switching a single parameter. It is recommended to run the analysis on both and compare outputs.

Script 1: Monthly Block Analysis

Overview

This script loads either malaria case or rainfall data, computes rolling 2-, 3-, 4-, and 5-month block sums for each district and year, and produces three outputs:

A summary table with the maximum block percentage per district per year
A detailed table with all block percentages per district per year
A frequency analysis table identifying which block is optimal most consistently across years

Step 1: Set Parameters

Before running the script, configure the country and variable of interest. The ‘variable’ parameter controls whether the script analyses rainfall or cases, switch this, then rerun the script to completion.

cli::cli_h1(
  "Setup parameters"
)

## ----------------------------------------------------------- ##
# Analysis Parameters --------------------------------------- ##
## ----------------------------------------------------------- ##

# Set paths
paths <- sntutils::setup_project_paths()

# Set Country iso
iso3      <- "gha"    #<----------- Change to your country ISO3 code
adm0_name <- "Ghana"  #<----------- Change to your country name

# What variable are you analysing
variable <- "rainfall" #<----------- Switch between "rainfall" and "case"

To adapt the code:

Line 4: Replace ‘“gha”’ with your country’s ISO3 code (e.g., ‘“gin”’ for Guinea)
Line 5: Update the country name accordingly
Line 8: Set to ‘“rainfall”’ to analyse rainfall data, or ‘“case”’ to analyse case data.

Step 2: Load Data

Load either rainfall or case data depending on your ‘variable’ setting. Then assign the chosen dataset to a common ‘df’ object before proceeding, this is the only switch required before running the remainder of the script.

cli::cli_h2("Load rainfall Data")

rainfall_data <- sntutils::read_snt_data(
  here::here(paths$climate, "processed"),
  glue::glue("{iso3}_rainfall_processed"),
  "xlsx"
) |>
  dplyr::group_by(adm1, adm2, year, month) |>
  dplyr::summarise(value = sum(mean_rainfall_mm), .groups = "drop")

cli::cli_h2("Load case Data")

case_data <- sntutils::read_snt_data(
  here::here(paths$dhis2, "processed"),
  glue::glue("{iso3}_dhis2_processed"),
  "xlsx"
) |>
  dplyr::filter(dplyr::if_all(c(conf, conf_ov5, conf_u5, conf_preg), ~ !is.na(.))) |>
  dplyr::select(adm0, adm1, adm2, year, month, conf, conf_ov5, conf_u5, conf_preg, date) |>
  dplyr::group_by(adm1, adm2, year, month) |>
  dplyr::summarise(value = sum(conf_ov5), .groups = "drop") # <-------------- switch variable depending on what analysing

# Switch here
df <- rainfall_data  #<----------- Switch between rainfall_data and case_data

To adapt the code:

Line 21: If analyzing cases, change ‘sum(conf)’ to ‘sum(conf_u5)’ or another case variable as appropriate for your analysis objective (e.g, confirmed cases in pregnant women)
Line 24: Switch df to point to ‘case_data’ when running the case analysis

Step 3: Prepare Data

Create a district identifier combining admin level 1 and admin level 2, and inspect the data to confirm the years and number of districts present.

cli::cli_h2("Prepare data")

# Create district identifier
df <- df |>
  dplyr::mutate(district = paste(adm1, adm2, sep = " - "))

# Get unique years and districts
years     <- sort(unique(df$year))
districts <- sort(unique(df$district))

cli::cli_alert_info(glue::glue("Years in data: {paste(years, collapse = ', ')}"))
cli::cli_alert_info(glue::glue("Number of districts: {length(districts)}"))
cli::cli_alert_info(glue::glue("Date range: {min(df$year)} - {max(df$year)}"))

To adapt the code:

Do not change anything in the code above, if your district names are unclear in outputs, verify that ‘adm1’ and ‘adm2’ are correctly populated in your processed data.

Step 4: Calculate Rolling Block Percentages

This is the core computation. For each district and year, the script:

Retrieves the monthly values and calculates the annual total
Computes the sum for each 2-, 3-, 4- and 5-month window starting from April
Records the maximum block sum and the corresponding block label for each duration
Expresses each block as a percentage of the annual total

A helper function converts start month and block length into a readable label.

get_month_label <- function(start_month, block_length) {
  month_abbr <- c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug",
                  "sep", "oct", "nov", "dec")
  months_in_block <- start_month:(start_month + block_length - 1)
  paste(month_abbr[months_in_block], collapse = "-")
}

# ============================================================================
# 2. CALCULATE ROLLING BLOCK PERCENTAGES
# ============================================================================

cli::cli_h2(
  "Calculate rolling block percentages"
)

# Initialize results dataframe
summary_results  <- data.frame()
detailed_results <- data.frame()

# Process each year
for (yr in years) {
  year_data <- df |> dplyr::filter(year == yr)

  # Process each district
  for (dist in districts) {
    district_year_data <- year_data |> dplyr::filter(district == dist)

    if (nrow(district_year_data) == 0) next

    monthly_vals <- district_year_data |>
      dplyr::select(month, value) |>
      dplyr::arrange(month)

    total_value <- sum(monthly_vals$value, na.rm = TRUE)

    if (total_value == 0) next

    month_lookup <- stats::setNames(monthly_vals$value, monthly_vals$month)

    # Calculate rolling sums for 2-month blocks (Apr-May through Oct-Nov) #####
    blocks_2m        <- list()
    blocks_2m_values <- c()

    for (start_month in 4:10) {
      block_name       <- get_month_label(start_month, 2)
      block_months     <- start_month:(start_month + 1)
      block_sum        <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
      blocks_2m[[block_name]]  <- block_sum
      blocks_2m_values <- c(blocks_2m_values, block_sum)
    }

    max_2m       <- max(blocks_2m_values)
    max_2m_block <- names(which(unlist(blocks_2m) == max_2m))[1]

    # Calculate rolling sums for 3-month blocks (Apr-Jun through Sep-Nov) #####
    blocks_3m        <- list()
    blocks_3m_values <- c()

    for (start_month in 4:9) {
      block_name       <- get_month_label(start_month, 3)
      block_months     <- start_month:(start_month + 2)
      block_sum        <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
      blocks_3m[[block_name]]  <- block_sum
      blocks_3m_values <- c(blocks_3m_values, block_sum)
    }

    max_3m       <- max(blocks_3m_values)
    max_3m_block <- names(which(unlist(blocks_3m) == max_3m))[1]

    # Calculate rolling sums for 4-month blocks (Apr-Jul through Sep-Dec) ####
    blocks_4m        <- list()
    blocks_4m_values <- c()

    for (start_month in 4:9) {
      block_name       <- get_month_label(start_month, 4)
      block_months     <- start_month:(start_month + 3)
      block_sum        <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
      blocks_4m[[block_name]]  <- block_sum
      blocks_4m_values <- c(blocks_4m_values, block_sum)
    }

    max_4m       <- max(blocks_4m_values)
    max_4m_block <- names(which(unlist(blocks_4m) == max_4m))[1]

    # Calculate rolling sums for 5-month blocks (Apr-Aug through Aug-Dec) ####
    blocks_5m        <- list()
    blocks_5m_values <- c()

    for (start_month in 4:8) {
      block_name       <- get_month_label(start_month, 5)
      block_months     <- start_month:(start_month + 4)
      block_sum        <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
      blocks_5m[[block_name]]  <- block_sum
      blocks_5m_values <- c(blocks_5m_values, block_sum)
    }

    max_5m       <- max(blocks_5m_values)
    max_5m_block <- names(which(unlist(blocks_5m) == max_5m))[1]

    # ========================================================================
    # CREATE SUMMARY OUTPUT ROW
    # ========================================================================
    summary_row <- data.frame(
      year        = yr,
      district    = dist,
      total_value = total_value,
      max_2m      = max_2m,
      max_3m      = max_3m,
      max_4m      = max_4m,
      max_5m      = max_5m,
      pct_2m      = (max_2m / total_value) * 100,
      pct_3m      = (max_3m / total_value) * 100,
      pct_4m      = (max_4m / total_value) * 100,
      pct_5m      = (max_5m / total_value) * 100
    )

    summary_results <- rbind(summary_results, summary_row)

    # ========================================================================
    # CREATE DETAILED OUTPUT ROW
    # ========================================================================
    detailed_row <- data.frame(
      district         = dist,
      years            = yr,
      stringsAsFactors = FALSE
    )

    for (block_name in names(blocks_2m)) {
      detailed_row[[block_name]] <- (blocks_2m[[block_name]] / total_value) * 100
    }
    for (block_name in names(blocks_3m)) {
      detailed_row[[block_name]] <- (blocks_3m[[block_name]] / total_value) * 100
    }
    for (block_name in names(blocks_4m)) {
      detailed_row[[block_name]] <- (blocks_4m[[block_name]] / total_value) * 100
    }
    for (block_name in names(blocks_5m)) {
      detailed_row[[block_name]] <- (blocks_5m[[block_name]] / total_value) * 100
    }

    detailed_row$max_2m       <- (max_2m / total_value) * 100
    detailed_row$max_3m       <- (max_3m / total_value) * 100
    detailed_row$max_4m       <- (max_4m / total_value) * 100
    detailed_row$max_5m       <- (max_5m / total_value) * 100
    detailed_row$max_2m_block <- max_2m_block
    detailed_row$max_3m_block <- max_3m_block
    detailed_row$max_4m_block <- max_4m_block
    detailed_row$max_5m_block <- max_5m_block

    detailed_results <- rbind(detailed_results, detailed_row)
  }
}

cli::cli_alert_success(
  glue::glue(
    "Analysis complete! Summary: {nrow(summary_results)} records | Detailed: {nrow(detailed_results)} records"
  )
)

To adapt the code:

Do not change anything in the code above. The block windows (April-September start months) are chosen to reflect the SMC-relevant transmission season; consult the SNT team before modifying these bounds based on special rainfall seasons for instance.

Step 5: Reorder columns for detailed output

# ============================================================================
# 3. REORDER COLUMNS FOR DETAILED OUTPUT
# ============================================================================

col_order <- c(
  "years", "district",
  # 2-month blocks
  "apr-may", "may-jun", "jun-jul", "jul-aug", "aug-sep", "sep-oct", "oct-nov",
  # 3-month blocks
  "apr-may-jun", "may-jun-jul", "jun-jul-aug", "jul-aug-sep", "aug-sep-oct", "sep-oct-nov",
  # 4-month blocks
  "apr-may-jun-jul", "may-jun-jul-aug", "jun-jul-aug-sep", "jul-aug-sep-oct", "aug-sep-oct-nov", "sep-oct-nov-dec",
  # 5-month blocks
  "apr-may-jun-jul-aug", "may-jun-jul-aug-sep", "jun-jul-aug-sep-oct", "jul-aug-sep-oct-nov", "aug-sep-oct-nov-dec",
  # Max values
  "max_2m", "max_3m", "max_4m", "max_5m",
  # Max block identifiers
  "max_2m_block", "max_3m_block", "max_4m_block", "max_5m_block"
)

Step 6: Save Block Analysis Output

Save both the summary and detailed results to Excel for review and quality assurance before proceeding to the frequency analysis.

# ============================================================================
# 4. SAVE OUTPUTS
# ============================================================================

cli::cli_h2(
  "Save outputs"
)

sntutils::write_snt_data(
  summary_results,
  here::here(paths$val_tbl, variable),
  glue::glue("{iso3}_malaria_{variable}_block_analysis"),
  "xlsx"
)

cli::cli_alert_success(
  glue::glue("Saved: {iso3}_malaria_{variable}_block_analysis.xlsx")
)

sntutils::write_snt_data(
  detailed_results,
  here::here(paths$val_tbl, variable),
  glue::glue("{iso3}_malaria_detailed_yearly_{variable}_block_analysis"),
  "xlsx"
)

cli::cli_alert_success(
  glue::glue("Saved: {iso3}_malaria_detailed_yearly_{variable}_block_analysis.xlsx")
)

Expected Outputs: - ’{iso3}malaria_{variable}_block_analysis.xlsx’ - one row per district per year, showing the mmaximum block percentage for 2-, 3-, 4-, 5-month windows - ’{iso3}malaria_detailed_yearly{variable}_block_analysis.xlsx’ - one row per district per year, with all individual block percentages and the best block label for each duration

Step 7: Frequency Analysis

This step aggregates the results across years to determine, for each district, which block is most frequently the optimal window at each duration. It also records the median proportion captured by that block in the years it was dominant, this is the figure used later to determine whether the 60% threshold is met.

# ============================================================================
# 5. GENERATE FREQUENCY ANALYSIS
# ============================================================================

cli::cli_h2(
  "Generate frequency analysis"
)

frequency_results <- data.frame()
districts_list    <- unique(detailed_results$district)

for (dist in districts_list) {
  district_detailed <- detailed_results |> dplyr::filter(district == dist)
  district_summary  <- summary_results  |> dplyr::filter(district == dist)

  all_years   <- sort(unique(district_detailed$years))
  total_years <- length(all_years)

  # 2m
  block_2m_freq <- table(district_detailed$max_2m_block)
  for (block in names(block_2m_freq)) {
    block_years <- district_detailed |>
      dplyr::filter(max_2m_block == block) |>
      dplyr::pull(years)

    freq_pct    <- (length(block_years) / total_years) * 100
    median_prop <- district_summary |>
      dplyr::filter(year %in% block_years) |>
      dplyr::pull(pct_2m) |>
      stats::median()

    frequency_results <- rbind(frequency_results, data.frame(
      district         = dist,
      duration         = 2,
      block            = block,
      block_freq       = round(freq_pct, 2),
      freq_numeric     = freq_pct,
      years            = paste(block_years, collapse = ", "),
      median_max_prop  = round(median_prop, 2),
      stringsAsFactors = FALSE
    ))
  }

  # 3m
  block_3m_freq <- table(district_detailed$max_3m_block)
  for (block in names(block_3m_freq)) {
    block_years <- district_detailed |>
      dplyr::filter(max_3m_block == block) |>
      dplyr::pull(years)

    freq_pct    <- (length(block_years) / total_years) * 100
    median_prop <- district_summary |>
      dplyr::filter(year %in% block_years) |>
      dplyr::pull(pct_3m) |>
      stats::median()

    frequency_results <- rbind(frequency_results, data.frame(
      district         = dist,
      duration         = 3,
      block            = block,
      block_freq       = round(freq_pct, 2),
      freq_numeric     = freq_pct,
      years            = paste(block_years, collapse = ", "),
      median_max_prop  = round(median_prop, 2),
      stringsAsFactors = FALSE
    ))
  }

  # 4m
  block_4m_freq <- table(district_detailed$max_4m_block)
  for (block in names(block_4m_freq)) {
    block_years <- district_detailed |>
      dplyr::filter(max_4m_block == block) |>
      dplyr::pull(years)

    freq_pct    <- (length(block_years) / total_years) * 100
    median_prop <- district_summary |>
      dplyr::filter(year %in% block_years) |>
      dplyr::pull(pct_4m) |>
      stats::median()

    frequency_results <- rbind(frequency_results, data.frame(
      district         = dist,
      duration         = 4,
      block            = block,
      block_freq       = round(freq_pct, 2),
      freq_numeric     = freq_pct,
      years            = paste(block_years, collapse = ", "),
      median_max_prop  = round(median_prop, 2),
      stringsAsFactors = FALSE
    ))
  }

  # 5m
  block_5m_freq <- table(district_detailed$max_5m_block)
  for (block in names(block_5m_freq)) {
    block_years <- district_detailed |>
      dplyr::filter(max_5m_block == block) |>
      dplyr::pull(years)

    freq_pct    <- (length(block_years) / total_years) * 100
    median_prop <- district_summary |>
      dplyr::filter(year %in% block_years) |>
      dplyr::pull(pct_5m) |>
      stats::median()

    frequency_results <- rbind(frequency_results, data.frame(
      district         = dist,
      duration         = 5,
      block            = block,
      block_freq       = round(freq_pct, 2),
      freq_numeric     = freq_pct,
      years            = paste(block_years, collapse = ", "),
      median_max_prop  = round(median_prop, 2),
      stringsAsFactors = FALSE
    ))
  }
}

frequency_results <- frequency_results |>
  dplyr::arrange(district, duration, dplyr::desc(freq_numeric)) |>
  dplyr::select(-freq_numeric)

cli::cli_alert_success(
  glue::glue("Frequency analysis complete! Total records: {nrow(frequency_results)}")
)

sntutils::write_snt_data(
  obj          = frequency_results,
  path         = here::here(paths$val_tbl, variable),
  data_name    = glue::glue("{iso3}_malaria_{variable}_block_frequency_analysis"),
  file_formats = "xlsx"
)

cli::cli_alert_success(
  glue::glue("Saved: {iso3}_malaria_{variable}_block_frequency_analysis.xlsx")
)

Expected Outputs: - ’{iso3}malaria_{variable}_block_frequency_analysis’ - one row per district-duration-block combination, with the frequency (% of years that block was dominant) and median proportion of annual total captured.

Key columns to review

Column	Description
‘district’	Admin 1 - Admin 2 label
‘duration’	Block length in months
‘block’	Block label
‘block_freq’	% of years in which this block was the dominant window
‘years’	Specific years in which this block was dominant
‘median_max_prop’	Median % of annual total captured in dominant years

Overview

Conceptual Background

Block Window

Identifying the Optimal Block

Minimum Cycle

Analytical Workflow

Script 1: Monthly Block Analysis

Overview

Step 1: Set Parameters

Step 2: Load Data

Step 3: Prepare Data

Step 4: Calculate Rolling Block Percentages

Step 5: Reorder columns for detailed output

Step 6: Save Block Analysis Output

Step 7: Frequency Analysis

Script 2