Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 4. Stratification
  2. 4.3 Seasonality
  3. Durations of Seasonality
  • Code library for subnational tailoring
    English version
  • 1. Getting Started
    • 1.1 About and Contact Information
    • 1.2 For Everyone
    • 1.3 For the SNT Team
    • 1.4 For Analysts
    • 1.5 Acronyms and Resource Library
    • 1.6 Producing High-Quality Outputs
  • 2. Data Assembly and Management
    • 2.1 Working with Shapefiles
      • Spatial data overview
      • Basic shapefile use and visualization
      • Shapefile management and customization
      • Merging shapefiles with tabular data
    • 2.2 Health Facilities Data
      • Fuzzy matching of names across datasets
      • Health facility coordinates and point data
    • 2.3 Routine Surveillance Data
      • Determining active and inactive status
      • Routine data extraction
      • DHIS2 data preprocessing
      • Missing data detection methods
      • Health facility reporting rate
      • Contextual considerations
      • Data coherency checks
      • Outlier detection methods
      • Imputation methods
      • Final database
    • 2.4 Stock Data
      • LMIS
    • 2.5 Population Data
      • National population data
      • WorldPop population raster
    • 2.6 National Household Survey Data
      • DHS data overview and preparation
      • Prevalence of malaria infection
      • All-cause child mortality
      • Treatment-seeking rates
      • ITN ownership, access, and usage
      • Wealth quintiles analysis
    • 2.7 Entomological Data
      • Entomological data
    • 2.8 Climate and Environmental Data
      • Climate and environment data extraction from raster
    • 2.9 Modeled Data
      • Generating spatial modeled estimates
      • Working with geospatial model estimates
      • Modeled estimates of malaria mortality and proxies
      • Modeled estimates of entomological indicators
    • 2.10 Cost Data
  • 3. Situation Analysis
    • 3.1 Review of Past Interventions
      • Case Management
      • Routine Interventions
      • Mass ITN Campaigns
      • Chemoprevention Campaigns
      • Other Vector Control
    • 3.2 Trend Analysis
    • 3.3 Risk Factors
    • 3.4 Impact Evaluation
    • 3.5 Cost Analysis
  • 4. Stratification
    • 4.1 Epidemiological Stratification
      • Incidence overview and crude incidence
      • Incidence adjustment 1: incomplete testing
      • Incidence adjustment 2: incomplete reporting
      • Incidence adjustment 3: treatment-seeking
      • Incidence stratification
      • Prevalence and mortality stratification
      • Combined risk categorization
      • Risk categorization REMOVE?
      • Risk categorization REMOVE?
    • 4.2 Access to Care
    • 4.3 Seasonality
      • Defining Seasonal Areas
      • Durations of Seasonality
    • 4.4 Urban Microstratification
  • 5. Intervention Targeting and Prioritization
    • 5.1 Intervention Targeting
    • 5.2 Prioritization
    • 5.3 Optimization under Limited Resources

On this page

  • Overview
  • Conceptual Background
    • Block Window
    • Identifying the Optimal Block
    • Minimum Cycle
  • Analytical Workflow
  • Script 1: Monthly Block Analysis
    • Overview
    • Step 1: Set Parameters
    • Step 2: Load Data
    • Step 3: Prepare Data
    • Step 4: Calculate Rolling Block Percentages
    • Step 5: Reorder columns for detailed output
    • Step 6: Save Block Analysis Output
    • Step 7: Frequency Analysis
  • Script 2
  1. 4. Stratification
  2. 4.3 Seasonality
  3. Durations of Seasonality

Durations of Seasonality

Compare 2-, 3-, 4-, and 5-month block durations to identify the best SMC start month.

Overview

Once an area has been identified as seasonal using the WHO 4-month rule, the next analytical step is to determine when SMC cycles should begin and how many cycles are needed to adequately cover the transmission season.

This analysis addresses two practical planning questions:

  1. What is the optimal start month for SMC? - i.e, which month consistently marks the beginning of the peak transmission window?
  2. What is the minimum number of monthly cycles needed to cover at least 60% of annual cases or rainfall? - i.e, do districts need 2, 3, 4 or 5 cycles?

The analysis uses a rolling block approach, examining every possible consecutive window of 2, 3, 4 and 5 months between April and December to identify which window captures the greatest concentration of malaria cases or rainfall. Results are summarised across years to determine the most consistently optimal block for each district.

NoteObjectives
  • Identify the month-block (2-, 3-, 4-, 5-month window) that captures the highest proportion of annual malaria cases or rainfall for each district
  • Determine the optimal start month for SMC based on consistent historical patterns
  • Identify the minimum number of cycles required to cover >= 60% of cases or rainfall
  • Produce district-level summary tables and maps for program planning

Conceptual Background

Block Window

Rather than assuming a fixed transmission season, this method tests all plausible SMC-relevant windows, starting from April and September, at four duration lengths; 2 months, 3 months, 4 months, 5 months. For each district and year, the method identifies which window of each duration captures the greatest share of annual cases or rainfall.

For example, for a 3-month block:

Block Months Covered
apr-may-jun April, May, June
may-jun-jul May, June, July
jun-jul-aug June, July, August
jul-aug-sep July, August, September
aug-sep-oct August, September, October
sep-oct-nov September, October, November

The same logic applies for the rest of the month blocks.

Identifying the Optimal Block

For each district-year combination, the block with the highest sum of cases or rainfall within each duration class is identified. The percentage of annual total captured by that block is recorded. Across multiple years, the block that appears most frequently as the optimal window is selected, and the median proportion it captures is used to assess adequacy.

Minimum Cycle

The >= 60% threshold from WHO guidance is applied again here, but now to determine how many cycles are needed:

  • If a 3-month block already captures >= 60% of the annual total, 3 cycles may be sufficient.
  • If a 3 months are insufficient but a 4-month block reaches 60%, 4 cycles are recommended.
  • If >= 60% is only achievable with a 5-month window, 5 cycles are recommended.

Where no window reaches 60%, the block with the highest coverage across durations is retained as the best available option.

ImportantConsult the SNT Team

These results inform, but do not replace, national program decision. Before finalising SMC cycle counts and start month:

  • Validate findings against operational feasibility and supply chain constraints
  • Confirm that the 60% threshold is appropriate for the national context, for example if the country experiences heavier rains throughout the year, can adjust to 70%
  • Cross-reference case-based and rainfall-based results for consistency
  • Document and justify any deviation from the analysis outputs

Analytical Workflow

This analysis involves two sequential scripts:

Script Purpose
Script 1 - Block Analysis Calculates rolling block percentages for each district-year and produces frequency summaries
Script 2 - Seasonality Mapping Applies the minimum cycle rule, selects optimal blocks, and produces maps

Both scripts can run on either case data or rainfall data by switching a single parameter. It is recommended to run the analysis on both and compare outputs.


Script 1: Monthly Block Analysis

Overview

This script loads either malaria case or rainfall data, computes rolling 2-, 3-, 4-, and 5-month block sums for each district and year, and produces three outputs:

  1. A summary table with the maximum block percentage per district per year
  2. A detailed table with all block percentages per district per year
  3. A frequency analysis table identifying which block is optimal most consistently across years

Step 1: Set Parameters

Before running the script, configure the country and variable of interest. The ‘variable’ parameter controls whether the script analyses rainfall or cases, switch this, then rerun the script to completion.

  • R
cli::cli_h1(
  "Setup parameters"
)

## ----------------------------------------------------------- ##
# Analysis Parameters --------------------------------------- ##
## ----------------------------------------------------------- ##

# Set paths
paths <- sntutils::setup_project_paths()

# Set Country iso
iso3      <- "gha"    #<----------- Change to your country ISO3 code
adm0_name <- "Ghana"  #<----------- Change to your country name

# What variable are you analysing
variable <- "rainfall" #<----------- Switch between "rainfall" and "case"

To adapt the code:

  • Line 4: Replace ‘“gha”’ with your country’s ISO3 code (e.g., ‘“gin”’ for Guinea)
  • Line 5: Update the country name accordingly
  • Line 8: Set to ‘“rainfall”’ to analyse rainfall data, or ‘“case”’ to analyse case data.

Step 2: Load Data

Load either rainfall or case data depending on your ‘variable’ setting. Then assign the chosen dataset to a common ‘df’ object before proceeding, this is the only switch required before running the remainder of the script.

  • R
cli::cli_h2("Load rainfall Data")

rainfall_data <- sntutils::read_snt_data(
  here::here(paths$climate, "processed"),
  glue::glue("{iso3}_rainfall_processed"),
  "xlsx"
) |>
  dplyr::group_by(adm1, adm2, year, month) |>
  dplyr::summarise(value = sum(mean_rainfall_mm), .groups = "drop")

cli::cli_h2("Load case Data")

case_data <- sntutils::read_snt_data(
  here::here(paths$dhis2, "processed"),
  glue::glue("{iso3}_dhis2_processed"),
  "xlsx"
) |>
  dplyr::filter(dplyr::if_all(c(conf, conf_ov5, conf_u5, conf_preg), ~ !is.na(.))) |>
  dplyr::select(adm0, adm1, adm2, year, month, conf, conf_ov5, conf_u5, conf_preg, date) |>
  dplyr::group_by(adm1, adm2, year, month) |>
  dplyr::summarise(value = sum(conf_ov5), .groups = "drop") # <-------------- switch variable depending on what analysing

# Switch here
df <- rainfall_data  #<----------- Switch between rainfall_data and case_data

To adapt the code:

  • Line 21: If analyzing cases, change ‘sum(conf)’ to ‘sum(conf_u5)’ or another case variable as appropriate for your analysis objective (e.g, confirmed cases in pregnant women)
  • Line 24: Switch df to point to ‘case_data’ when running the case analysis

Step 3: Prepare Data

Create a district identifier combining admin level 1 and admin level 2, and inspect the data to confirm the years and number of districts present.

  • R
cli::cli_h2("Prepare data")

# Create district identifier
df <- df |>
  dplyr::mutate(district = paste(adm1, adm2, sep = " - "))

# Get unique years and districts
years     <- sort(unique(df$year))
districts <- sort(unique(df$district))

cli::cli_alert_info(glue::glue("Years in data: {paste(years, collapse = ', ')}"))
cli::cli_alert_info(glue::glue("Number of districts: {length(districts)}"))
cli::cli_alert_info(glue::glue("Date range: {min(df$year)} - {max(df$year)}"))

To adapt the code:

  • Do not change anything in the code above, if your district names are unclear in outputs, verify that ‘adm1’ and ‘adm2’ are correctly populated in your processed data.

Step 4: Calculate Rolling Block Percentages

This is the core computation. For each district and year, the script:

  1. Retrieves the monthly values and calculates the annual total
  2. Computes the sum for each 2-, 3-, 4- and 5-month window starting from April
  3. Records the maximum block sum and the corresponding block label for each duration
  4. Expresses each block as a percentage of the annual total

A helper function converts start month and block length into a readable label.

  • R
get_month_label <- function(start_month, block_length) {
  month_abbr <- c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug",
                  "sep", "oct", "nov", "dec")
  months_in_block <- start_month:(start_month + block_length - 1)
  paste(month_abbr[months_in_block], collapse = "-")
}

# ============================================================================
# 2. CALCULATE ROLLING BLOCK PERCENTAGES
# ============================================================================

cli::cli_h2(
  "Calculate rolling block percentages"
)

# Initialize results dataframe
summary_results  <- data.frame()
detailed_results <- data.frame()

# Process each year
for (yr in years) {
  year_data <- df |> dplyr::filter(year == yr)

  # Process each district
  for (dist in districts) {
    district_year_data <- year_data |> dplyr::filter(district == dist)

    if (nrow(district_year_data) == 0) next

    monthly_vals <- district_year_data |>
      dplyr::select(month, value) |>
      dplyr::arrange(month)

    total_value <- sum(monthly_vals$value, na.rm = TRUE)

    if (total_value == 0) next

    month_lookup <- stats::setNames(monthly_vals$value, monthly_vals$month)

    # Calculate rolling sums for 2-month blocks (Apr-May through Oct-Nov) #####
    blocks_2m        <- list()
    blocks_2m_values <- c()

    for (start_month in 4:10) {
      block_name       <- get_month_label(start_month, 2)
      block_months     <- start_month:(start_month + 1)
      block_sum        <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
      blocks_2m[[block_name]]  <- block_sum
      blocks_2m_values <- c(blocks_2m_values, block_sum)
    }

    max_2m       <- max(blocks_2m_values)
    max_2m_block <- names(which(unlist(blocks_2m) == max_2m))[1]

    # Calculate rolling sums for 3-month blocks (Apr-Jun through Sep-Nov) #####
    blocks_3m        <- list()
    blocks_3m_values <- c()

    for (start_month in 4:9) {
      block_name       <- get_month_label(start_month, 3)
      block_months     <- start_month:(start_month + 2)
      block_sum        <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
      blocks_3m[[block_name]]  <- block_sum
      blocks_3m_values <- c(blocks_3m_values, block_sum)
    }

    max_3m       <- max(blocks_3m_values)
    max_3m_block <- names(which(unlist(blocks_3m) == max_3m))[1]

    # Calculate rolling sums for 4-month blocks (Apr-Jul through Sep-Dec) ####
    blocks_4m        <- list()
    blocks_4m_values <- c()

    for (start_month in 4:9) {
      block_name       <- get_month_label(start_month, 4)
      block_months     <- start_month:(start_month + 3)
      block_sum        <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
      blocks_4m[[block_name]]  <- block_sum
      blocks_4m_values <- c(blocks_4m_values, block_sum)
    }

    max_4m       <- max(blocks_4m_values)
    max_4m_block <- names(which(unlist(blocks_4m) == max_4m))[1]

    # Calculate rolling sums for 5-month blocks (Apr-Aug through Aug-Dec) ####
    blocks_5m        <- list()
    blocks_5m_values <- c()

    for (start_month in 4:8) {
      block_name       <- get_month_label(start_month, 5)
      block_months     <- start_month:(start_month + 4)
      block_sum        <- sum(month_lookup[as.character(block_months)], na.rm = TRUE)
      blocks_5m[[block_name]]  <- block_sum
      blocks_5m_values <- c(blocks_5m_values, block_sum)
    }

    max_5m       <- max(blocks_5m_values)
    max_5m_block <- names(which(unlist(blocks_5m) == max_5m))[1]

    # ========================================================================
    # CREATE SUMMARY OUTPUT ROW
    # ========================================================================
    summary_row <- data.frame(
      year        = yr,
      district    = dist,
      total_value = total_value,
      max_2m      = max_2m,
      max_3m      = max_3m,
      max_4m      = max_4m,
      max_5m      = max_5m,
      pct_2m      = (max_2m / total_value) * 100,
      pct_3m      = (max_3m / total_value) * 100,
      pct_4m      = (max_4m / total_value) * 100,
      pct_5m      = (max_5m / total_value) * 100
    )

    summary_results <- rbind(summary_results, summary_row)

    # ========================================================================
    # CREATE DETAILED OUTPUT ROW
    # ========================================================================
    detailed_row <- data.frame(
      district         = dist,
      years            = yr,
      stringsAsFactors = FALSE
    )

    for (block_name in names(blocks_2m)) {
      detailed_row[[block_name]] <- (blocks_2m[[block_name]] / total_value) * 100
    }
    for (block_name in names(blocks_3m)) {
      detailed_row[[block_name]] <- (blocks_3m[[block_name]] / total_value) * 100
    }
    for (block_name in names(blocks_4m)) {
      detailed_row[[block_name]] <- (blocks_4m[[block_name]] / total_value) * 100
    }
    for (block_name in names(blocks_5m)) {
      detailed_row[[block_name]] <- (blocks_5m[[block_name]] / total_value) * 100
    }

    detailed_row$max_2m       <- (max_2m / total_value) * 100
    detailed_row$max_3m       <- (max_3m / total_value) * 100
    detailed_row$max_4m       <- (max_4m / total_value) * 100
    detailed_row$max_5m       <- (max_5m / total_value) * 100
    detailed_row$max_2m_block <- max_2m_block
    detailed_row$max_3m_block <- max_3m_block
    detailed_row$max_4m_block <- max_4m_block
    detailed_row$max_5m_block <- max_5m_block

    detailed_results <- rbind(detailed_results, detailed_row)
  }
}

cli::cli_alert_success(
  glue::glue(
    "Analysis complete! Summary: {nrow(summary_results)} records | Detailed: {nrow(detailed_results)} records"
  )
)

To adapt the code:

  • Do not change anything in the code above. The block windows (April-September start months) are chosen to reflect the SMC-relevant transmission season; consult the SNT team before modifying these bounds based on special rainfall seasons for instance.

Step 5: Reorder columns for detailed output

  • R
# ============================================================================
# 3. REORDER COLUMNS FOR DETAILED OUTPUT
# ============================================================================

col_order <- c(
  "years", "district",
  # 2-month blocks
  "apr-may", "may-jun", "jun-jul", "jul-aug", "aug-sep", "sep-oct", "oct-nov",
  # 3-month blocks
  "apr-may-jun", "may-jun-jul", "jun-jul-aug", "jul-aug-sep", "aug-sep-oct", "sep-oct-nov",
  # 4-month blocks
  "apr-may-jun-jul", "may-jun-jul-aug", "jun-jul-aug-sep", "jul-aug-sep-oct", "aug-sep-oct-nov", "sep-oct-nov-dec",
  # 5-month blocks
  "apr-may-jun-jul-aug", "may-jun-jul-aug-sep", "jun-jul-aug-sep-oct", "jul-aug-sep-oct-nov", "aug-sep-oct-nov-dec",
  # Max values
  "max_2m", "max_3m", "max_4m", "max_5m",
  # Max block identifiers
  "max_2m_block", "max_3m_block", "max_4m_block", "max_5m_block"
)

Step 6: Save Block Analysis Output

Save both the summary and detailed results to Excel for review and quality assurance before proceeding to the frequency analysis.

  • R
# ============================================================================
# 4. SAVE OUTPUTS
# ============================================================================

cli::cli_h2(
  "Save outputs"
)

sntutils::write_snt_data(
  summary_results,
  here::here(paths$val_tbl, variable),
  glue::glue("{iso3}_malaria_{variable}_block_analysis"),
  "xlsx"
)

cli::cli_alert_success(
  glue::glue("Saved: {iso3}_malaria_{variable}_block_analysis.xlsx")
)

sntutils::write_snt_data(
  detailed_results,
  here::here(paths$val_tbl, variable),
  glue::glue("{iso3}_malaria_detailed_yearly_{variable}_block_analysis"),
  "xlsx"
)

cli::cli_alert_success(
  glue::glue("Saved: {iso3}_malaria_detailed_yearly_{variable}_block_analysis.xlsx")
)

Expected Outputs: - ’{iso3}malaria_{variable}_block_analysis.xlsx’ - one row per district per year, showing the mmaximum block percentage for 2-, 3-, 4-, 5-month windows - ’{iso3}malaria_detailed_yearly{variable}_block_analysis.xlsx’ - one row per district per year, with all individual block percentages and the best block label for each duration

Step 7: Frequency Analysis

This step aggregates the results across years to determine, for each district, which block is most frequently the optimal window at each duration. It also records the median proportion captured by that block in the years it was dominant, this is the figure used later to determine whether the 60% threshold is met.

  • R
# ============================================================================
# 5. GENERATE FREQUENCY ANALYSIS
# ============================================================================

cli::cli_h2(
  "Generate frequency analysis"
)

frequency_results <- data.frame()
districts_list    <- unique(detailed_results$district)

for (dist in districts_list) {
  district_detailed <- detailed_results |> dplyr::filter(district == dist)
  district_summary  <- summary_results  |> dplyr::filter(district == dist)

  all_years   <- sort(unique(district_detailed$years))
  total_years <- length(all_years)

  # 2m
  block_2m_freq <- table(district_detailed$max_2m_block)
  for (block in names(block_2m_freq)) {
    block_years <- district_detailed |>
      dplyr::filter(max_2m_block == block) |>
      dplyr::pull(years)

    freq_pct    <- (length(block_years) / total_years) * 100
    median_prop <- district_summary |>
      dplyr::filter(year %in% block_years) |>
      dplyr::pull(pct_2m) |>
      stats::median()

    frequency_results <- rbind(frequency_results, data.frame(
      district         = dist,
      duration         = 2,
      block            = block,
      block_freq       = round(freq_pct, 2),
      freq_numeric     = freq_pct,
      years            = paste(block_years, collapse = ", "),
      median_max_prop  = round(median_prop, 2),
      stringsAsFactors = FALSE
    ))
  }

  # 3m
  block_3m_freq <- table(district_detailed$max_3m_block)
  for (block in names(block_3m_freq)) {
    block_years <- district_detailed |>
      dplyr::filter(max_3m_block == block) |>
      dplyr::pull(years)

    freq_pct    <- (length(block_years) / total_years) * 100
    median_prop <- district_summary |>
      dplyr::filter(year %in% block_years) |>
      dplyr::pull(pct_3m) |>
      stats::median()

    frequency_results <- rbind(frequency_results, data.frame(
      district         = dist,
      duration         = 3,
      block            = block,
      block_freq       = round(freq_pct, 2),
      freq_numeric     = freq_pct,
      years            = paste(block_years, collapse = ", "),
      median_max_prop  = round(median_prop, 2),
      stringsAsFactors = FALSE
    ))
  }

  # 4m
  block_4m_freq <- table(district_detailed$max_4m_block)
  for (block in names(block_4m_freq)) {
    block_years <- district_detailed |>
      dplyr::filter(max_4m_block == block) |>
      dplyr::pull(years)

    freq_pct    <- (length(block_years) / total_years) * 100
    median_prop <- district_summary |>
      dplyr::filter(year %in% block_years) |>
      dplyr::pull(pct_4m) |>
      stats::median()

    frequency_results <- rbind(frequency_results, data.frame(
      district         = dist,
      duration         = 4,
      block            = block,
      block_freq       = round(freq_pct, 2),
      freq_numeric     = freq_pct,
      years            = paste(block_years, collapse = ", "),
      median_max_prop  = round(median_prop, 2),
      stringsAsFactors = FALSE
    ))
  }

  # 5m
  block_5m_freq <- table(district_detailed$max_5m_block)
  for (block in names(block_5m_freq)) {
    block_years <- district_detailed |>
      dplyr::filter(max_5m_block == block) |>
      dplyr::pull(years)

    freq_pct    <- (length(block_years) / total_years) * 100
    median_prop <- district_summary |>
      dplyr::filter(year %in% block_years) |>
      dplyr::pull(pct_5m) |>
      stats::median()

    frequency_results <- rbind(frequency_results, data.frame(
      district         = dist,
      duration         = 5,
      block            = block,
      block_freq       = round(freq_pct, 2),
      freq_numeric     = freq_pct,
      years            = paste(block_years, collapse = ", "),
      median_max_prop  = round(median_prop, 2),
      stringsAsFactors = FALSE
    ))
  }
}

frequency_results <- frequency_results |>
  dplyr::arrange(district, duration, dplyr::desc(freq_numeric)) |>
  dplyr::select(-freq_numeric)

cli::cli_alert_success(
  glue::glue("Frequency analysis complete! Total records: {nrow(frequency_results)}")
)

sntutils::write_snt_data(
  obj          = frequency_results,
  path         = here::here(paths$val_tbl, variable),
  data_name    = glue::glue("{iso3}_malaria_{variable}_block_frequency_analysis"),
  file_formats = "xlsx"
)

cli::cli_alert_success(
  glue::glue("Saved: {iso3}_malaria_{variable}_block_frequency_analysis.xlsx")
)

Expected Outputs: - ’{iso3}malaria_{variable}_block_frequency_analysis’ - one row per district-duration-block combination, with the frequency (% of years that block was dominant) and median proportion of annual total captured.

Key columns to review

Column Description
‘district’ Admin 1 - Admin 2 label
‘duration’ Block length in months
‘block’ Block label
‘block_freq’ % of years in which this block was the dominant window
‘years’ Specific years in which this block was dominant
‘median_max_prop’ Median % of annual total captured in dominant years

Script 2

 

©2026 Applied Health Analytics for Delivery and Innovation. All rights reserved