Seasonality

Overview

Seasonality refers to predictable, yearly changes driven by environmental factors such as rainfall, temperature, and humidity. In malaria, these changes matter because mosquito breeding and transmission depend heavily on rainfall. During rainy months, mosquito numbers and malaria cases rise; in dry months, breeding diminishes and cases drop.

Understanding these patterns helps programs plan when and where to act. Interventions such as Seasonal Malaria Chemoprevention (SMC), ITN distribution, and IRS campaigns are most effective when scheduled before or during the high-transmission season.

Objectives

Identify areas with a seasonal rainfall pattern linked to predictable malaria transmission
Compare rainfall and malaria peaks to guide intervention timing
Use a rolling four-month window analysis to detect consistent seasonal patterns across multiple years
Classify areas as seasonal or non-seasonal based on pattern strength
Produce maps and summaries for program decision-making

WHO 4-Month Rule

According to WHO guidance, a location is considered seasonal if 60% or more of total annual rainfall (or malaria cases) occurs within any 4 consecutive months. This identifies likely periods of peak transmission and guides intervention timing.

Consult the SNT Team

The 4-month window and 60% threshold come from WHO recommendations, but rainfall and transmission patterns vary by country.

Before applying or adjusting this rule:

Consult the SNT to confirm whether 60% is appropriate for the local context
Adjust if rainfall or malaria data show different timing patterns
Document and justify any modification to ensure national consistency

Rolling Window Concept

The rolling window concept evaluates seasonality by moving a 4-month window across all months of each year and comparing it with a corresponding 12-month window. For each starting month, the code calculates rainfall totals for both periods to determine the proportion of annual rainfall that falls within each 4-month span.

For example, in 2019:

When the 4-month window is January–April 2019, the corresponding 12-month window is January–December 2019.
When the 4-month window is February–May 2019, the 12-month window is February 2019–January 2020.
When the 4-month window is March–June 2019, the 12-month window is March 2019–February 2020.

By testing all possible 4-month periods, the method captures rainfall peaks occurring at different times of the year and identifies consistent seasonal patterns across multiple years.

Finally, the share of annual rainfall captured by each 4-month window is calculated. If this percentage is ≥ 60 %, that window is classified as seasonal.

Determine if Each Year is Seasonal

The table below tests rainfall concentration for a single year (2023). All twelve possible 4 month windows (from January through April to December through March) are examined. If any 4 month window exceeds 60% of the year’s rainfall, the year is classified as seasonal.

Start Month	4-Month Window	12-Month Comparison Period	Monthly Rainfall (mm)	Total Rainfall (4 Months)	Total Rainfall (12 Months)	% of Annual Rainfall	Meets 60% Rule?
Jan 2023	Jan 2023 – Apr 2023	Jan 2023 – Dec 2023	20 + 40 + 180 + 260	500	830	60.2%	Yes
Feb 2023	Feb 2023 – May 2023	Feb 2023 – Jan 2024	40 + 180 + 260 + 150	630	830	75.9%	Yes
Mar 2023	Mar 2023 – Jun 2023	Mar 2023 – Feb 2024	180 + 260 + 150 + 80	670	830	80.7%	Yes
Apr 2023	Apr 2023 – Jul 2023	Apr 2023 – Mar 2024	260 + 150 + 80 + 40	530	830	63.9%	Yes
May 2023	May 2023 – Aug 2023	May 2023 – Apr 2024	150 + 80 + 40 + 20	290	830	34.9%	No
Jun 2023	Jun 2023 – Sep 2023	Jun 2023 – May 2024	80 + 40 + 20 + 10	150	830	18.1%	No
Jul 2023	Jul 2023 – Oct 2023	Jul 2023 – Jun 2024	40 + 20 + 10 + 10	80	830	9.6%	No
Aug 2023	Aug 2023 – Nov 2023	Aug 2023 – Jul 2024	20 + 10 + 10 + 20	60	830	7.2%	No
Sep 2023	Sep 2023 – Dec 2023	Sep 2023 – Aug 2024	10 + 10 + 20 + 20	60	830	7.2%	No
Oct 2023	Oct 2023 – Jan 2024	Oct 2023 – Sep 2024	10 + 20 + 40 + 180	250	830	30.1%	No
Nov 2023	Nov 2023 – Feb 2024	Nov 2023 – Oct 2024	20 + 40 + 180 + 260	500	830	60.2%	Yes
Dec 2023	Dec 2023 – Mar 2024	Dec 2023 – Nov 2024	40 + 180 + 260 + 150	630	830	75.9%	Yes

Interpretation. Several windows (e.g., March–June and December–March) exceed 60%, so 2023 is a seasonal year. Repeat this process for every year in the dataset.

Determine if the Location is Seasonal

Summarize results across all years to check consistency:

Each 4-month period that meets the 60% rule counts as a seasonal block.
If a year has at least one seasonal block, it is a seasonal year.
If all analyzed years are seasonal, the location is Seasonal overall. If any year is not seasonal, the location is Not Seasonal.

Scenario 1: Non-Seasonal Location

Year	Total 4-Month Blocks Tested	Seasonal Blocks (≥ 60%)	Is the Year Seasonal?
2015	12	2	Yes
2016	12	1	Yes
2017	12	1	Yes
2018	12	2	Yes
2019	12	0	No
2020	12	1	Yes
2021	12	2	Yes
2022	12	1	Yes
2023	12	3	Yes
Location Classification	—	—	Not Seasonal (fails in 2019)

Interpretation. Because 2019 fails, the pattern is inconsistent. The area is Not Seasonal, suggesting variable rainfall/transmission timing that weakens time-bound interventions like SMC.

Scenario 2: Seasonal Location

Year	Total 4-Month Blocks Tested	Seasonal Blocks (≥ 60%)	Is the Year Seasonal?
2015	12	1	Yes
2016	12	2	Yes
2017	12	1	Yes
2018	12	1	Yes
2019	12	1	Yes
2020	12	1	Yes
2021	12	2	Yes
2022	12	1	Yes
2023	12	1	Yes
Location Classification	—	—	Seasonal (all years meet rule)

Interpretation. Each year from 2015–2023 includes at least one 4-month period ≥ 60% of annual rainfall. The location is Seasonal, indicating a consistent rain-driven transmission cycle and SMC eligibility.

Summary of the Logic

Level of Analysis	Criterion	Classification
4-Month Block	≥ 60% of annual rainfall	Seasonal Block
Year	≥ 1 seasonal block	Seasonal Year
Location (2015–2023)	All years seasonal	Seasonal Location
	One or more years not seasonal	Not Seasonal Location

Key Takeaways

The rolling 4-month approach captures local variation in timing
A seasonal block means ≥ 60% of annual rainfall
A year is seasonal if it contains at least one seasonal block
A location is seasonal if all years in scope have at least one seasonal block
Multi-year consistency supports precise timing for SMC and other rainfall-driven interventions

Data Requirements

Both rainfall and malaria case data can be used for seasonality analysis. This example uses rainfall because it closely reflects environmental drivers of transmission.

To ensure reliable classification:

Provide at least six consecutive years of complete monthly data.
Ensure all twelve months are present for each year.
Handle missingness; incomplete data can misclassify areas.

Consult the SNT Team

The six-year minimum follows WHO analytical guidance. If the dataset is shorter (e.g., 4 years) the SNT should review and approve the adjustment.

Before proceeding, the SNT should:

Confirm the time span reflects consistent climatic conditions
Ensure comparable data coverage across administrative areas
Document and justify any change for transparency and reproducibility

Analytical Approach

This analysis uses CHIRPS satellite-derived rainfall data as a proxy for malaria transmission intensity. Sierra Leone serves as the case study, but the method generalizes to other countries with similar rainfall data.

Data Source: Follow the steps in the CHIRPS Data Extraction Guide to obtain and prepare rainfall datasets.
Processing: Organize data by year, month, and administrative unit (e.g., district). Apply a rolling 4-month window within each 12-month period to compute rainfall concentration.
Classification: If ≥ 60% of annual rainfall falls within any 4 consecutive months, mark a seasonal block for that year/location. Repeat across multiple years to assess stability:
- Seasonal Location: Every analyzed year contains ≥ 1 seasonal block.
- Not Seasonal Location: ≥ 1 analyzed year contains no seasonal block.
Mapping: Join results to geographic boundaries (e.g., district/ chiefdom shapefiles) to visualize spatial patterns of seasonality.
Review: Share outputs with the SNT for validation against local transmission knowledge and for scheduling interventions (e.g., SMC).

Profiling for Seasonality

This section illustrates the seasonality analysis workflow used to determine SMC (Seasonal Malaria Chemoprevention) eligibility in malaria-endemic areas.

The workflow uses CHIRPS satellite-derived rainfall data from January 2018 to December 2023 (72 months total) for a hypothetical district in Sierra Leone.

Overview of the Seasonality Profiling Process

The entire process is summarized in Figure 1 below.

Figure 1. Three-step seasonality profiling workflow using 2018–2023 rainfall data. The workflow identifies whether districts have consistent seasonal rainfall patterns suitable for SMC targeting.

Step-by-Step Explanation (Linked to Figure 1)

The visualization above (Figure 1) illustrates how rainfall data are processed and classified into “Seasonal” or “Not Seasonal” categories through three main steps.

Step 1: Divide Data into Month-Blocks

The blue area chart in Figure 1 (top panel) shows monthly rainfall patterns over the six-year period (2018–2023).
Colored horizontal bars represent different 4-month rolling windows tested (e.g., Jan–Apr 2018, Feb–May 2018, etc.).
Limitation: Not all possible windows can be assessed — windows starting in 2023 would require 2024 data to complete a 12-month cycle.
- As a result, only 61 month-blocks are evaluable (out of 72 months).

Step 2: Assess Whether Each Month-Block Meets the 60 % Threshold

For every 4-month window shown in Figure 1:

Numerator: Sum of rainfall during those four consecutive months
Denominator: Sum of rainfall over the 12-month period starting from the same month
If the 4-month total is ≥ 60 % of the 12-month total, that window is marked as a “seasonal peak.”

Step 3: Identify and Count Seasonal Peaks

The bottom bar chart in Figure 1 shows results for all evaluated 4-month windows.
Purple bars mark windows that met the ≥ 60 % threshold (seasonal peaks).
Gray bars mark those that did not meet the threshold.
Purple arrows highlight years where seasonal peaks were observed.

These visual cues allow quick identification of consistent seasonal rainfall patterns across years.

Classification Logic

A location is classified as Seasonal if it shows at least one seasonal peak in every evaluated year.

Example Interpretation

The dataset covers January 2018 – December 2023 (6 years total).
However, only 5 years (2018–2022) can be fully evaluated because 2023 lacks a complete 12-month window.
Therefore, 61 evaluable month-blocks remain.

In this example (see Figure 1):

Purple arrows indicate years where peaks occurred.
Two equivalent ways to express the finding:
- Including all data years: “4 of 5 years have seasonality peaks.”
- Considering only evaluable years: “4 of 4 evaluable years have seasonality peaks.”
All evaluable years are seasonal, hence the district is classified as Seasonal.

Consult the SNT Team

Before applying this workflow:

Confirm the 60 % threshold suits your local context.
Validate whether the “all evaluable years must be seasonal” rule should apply nationally.
Record any changes to ensure national consistency in classification.

Key Conventions Explained

Four-Month Windows Capture the main 3–5-month malaria transmission period typical of endemic regions.
Sixty Percent Threshold Ensures rainfall and therefore malaria transmission is concentrated enough to support time bound interventions such as SMC.
Rolling Windows Test all possible 4-month combinations rather than fixed months, capturing local variation in rainy season timing.
Consistency Across Years Stability across multiple years is critical for predictable intervention timing.

Data Requirements

Requirement	Description
Data Length	Minimum of 6 continuous years (WHO recommendation)
Completeness	All 12 months must be present for each year
Data Source	CHIRPS rainfall data or malaria surveillance data
Geographic Level	Aggregated by district, chiefdom, or health-facility catchment

Step-by-Step

The steps below walk through the complete seasonality analysis from data preparation to classification and visualization. Each step explains the code logic and expected outputs.

To skip the step-by-step explanation, jump to the full code at the end of this page.

Step 1: Import packages and data

This step loads all the software tools (packages) needed for the analysis and reads your rainfall data file.

R
Python

if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")

pacman::p_load(
  readxl, dplyr, openxlsx, lubridate, ggplot2, readr, stringr, 
  here, tidyr, gridExtra, knitr, writexl, sf
)

# Load rainfall data
file_path <- here::here("english/data_r/modeled", "chirps_data_2015_2023_lastest.xls")
data <- readxl::read_excel(file_path)

To adapt the code:

Line 14: Update file path to your rainfall dataset location
Line 15: Change file name to match your data file

Step 2: Configure analysis parameters

This step sets up the “rules” for your analysis by telling R which columns in your data contain which information, what time period to analyze, and what threshold defines “seasonal.” The year_column variable should be set to the name of the column containing years (like 2015, 2016, etc.), month_column to the column with months (1-12), value_column to the column with rainfall amounts, and admin_columns to the geographic unit names (like district and chiefdom). The analysis_start_year sets the first year to include in the analysis, analysis_start_month defines which month to start from (usually January = 1), and seasonality_threshold sets the percentage of annual rainfall that must occur in a 4-month period to be considered seasonal (60 is the WHO standard, meaning 60% of annual rainfall concentrated in 4 months).

R
Python

# Column names in your dataset
year_column <- "Year"
month_column <- "Month"
value_column <- "mean_rain"
admin_columns <- c("FIRST_DNAM", "FIRST_CHIE")

# Analysis parameters
analysis_start_year <- 2015
analysis_start_month <- 1
seasonality_threshold <- 60  # 60% of annual rainfall in 4-month peak

# Output file names (uncomment to save)
# detailed_output <- "detailed_seasonality_results.xlsx"
# yearly_output <- "yearly_analysis_summary.xlsx"
# location_output <- "location_seasonality_summary.xlsx"

To adapt the code:

Lines 3-6: Update column names to match your dataset structure
Lines 9-10: Set your analysis start year and month
Line 11: Adjust threshold (60 means 60% of annual rainfall occurs in peak 4 months)
Lines 14-16: Uncomment and modify output file names if you want to save results

Step 3: Prepare data for analysis

This step ensures that your dataset is valid, complete, and properly structured for seasonality analysis. It begins by verifying that all required columns such as year, month, rainfall value, and administrative units are present in your data. Once confirmed, the process filters the dataset to include only the desired analysis period and converts the month values to numeric format to support accurate calculations. Next, it creates administrative groupings by combining one or more geographic levels such as district and chiefdom into a single variable to ensure each unique area is analyzed independently. Finally, the step checks that your dataset spans at least six years of data which is the minimum recommended duration for reliable seasonality classification.

Step 3.1 Check required columns

The code below verifies that all the necessary columns exist in your dataset before analysis begins. It identifies any missing columns and stops the process with a clear message so you can correct the issue.

R
Python

# Verify all required columns exist
required_cols <- c(year_column, month_column, value_column, admin_columns)
missing_cols <- required_cols[!required_cols %in% colnames(data)]

if (length(missing_cols) > 0) {
  stop(paste("Missing columns:", paste(missing_cols, collapse = ", ")))
}

To adapt the code:

Do not change anything in the code above

Step 3.2 Filter data and ensure numeric month

This code first checks that all required columns are present in your dataset. If any are missing, R stops and lists the columns that need to be corrected. When all columns are available, the code runs smoothly without displaying any message. It then filters the dataset to include only records within the selected analysis period, removes entries with missing or incomplete year or month values, and converts the month column to numeric format for accurate calculations. After execution, verify that the number of remaining rows is reasonable. If many records are missing, review the original dataset for potential gaps or errors.

R
Python

# Filter to analysis period and convert month to numeric
filtered_data <- data |>
  dplyr::filter(
    !is.na(!!sym(year_column)) & 
    !is.na(!!sym(month_column)) & 
    !!sym(year_column) >= analysis_start_year
  ) |>
  dplyr::mutate(Month = as.numeric(!!sym(month_column)))

To adapt the code:

Do not change anything in the code above

Step 3.3 Create administrative grouping variable

This code creates a single administrative grouping variable by combining one or more columns, such as district and chiefdom, to ensure each geographic unit is analyzed separately. The resulting column, admin_group, either duplicates the existing administrative level (if only one is provided) or merges multiple levels into a single text string separated by a vertical bar. This unified identifier allows R to treat each unique administrative unit as a distinct group during analysis.

R
Python

# Create combined admin grouping
if (length(admin_columns) == 1) {
  filtered_data$admin_group <- filtered_data[[admin_columns[1]]]
} else {
  filtered_data <- filtered_data |>
    dplyr::mutate(
      admin_group = paste(!!!syms(admin_columns), sep = " | ")
    )
}

To adapt the code:

Do not change anything in the code above

Step 3.4 Validate data span

The code below checks whether your dataset covers at least six years, which is the recommended duration for detecting seasonal patterns. It also calculates the number of complete years and monthly analysis blocks available for use.

R
Python

# Check data availability
available_years <- sort(unique(filtered_data[[year_column]]))
data_span_years <- length(available_years)

if (data_span_years < 6) {
  stop(paste(
    "Insufficient data: Requires at least 6 years, but only", 
    data_span_years, "years found."
  ))
}

# Calculate number of analysis blocks (one per month for complete years)
num_complete_years <- data_span_years - 1
total_num_blocks <- num_complete_years * 12

To adapt the code:

Line 6: Change the minimum year requirement, if needed

Step 4: Generate rolling time blocks

This step creates all the time windows needed for seasonality analysis. Instead of assuming the rainy season always occurs during the same months each year, this approach tests every possible 4-month period to find when peak rainfall actually occurs. For each location and each starting month, the code creates two overlapping windows: a 4-month window representing a potential peak season and a 12-month window representing the full year for comparison. By systematically testing all possible 4-month windows across the entire dataset, this method identifies the true timing of seasonal peaks without making assumptions about when they should occur.

Step 4.1: Define month names

This code creates readable month labels to make date ranges easier to interpret in the output. It defines a simple vector of month abbreviations that R uses to display date ranges in a clear format, such as “Mar 2019–Jun 2019”

R
Python

# Month labels for output
month_names <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                 "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

To adapt the code:

Do not change anything in the code above

Step 4.2: Initialize block storage

This code initializes an empty data structure to store information about each time block and defines the starting point for the analysis. The blocks data frame holds all time window definitions, while the current_year and current_month variables set and track the starting position as the code advances month by month through the dataset.

R
Python

# Initialize block tracking
blocks <- data.frame()
current_year <- analysis_start_year
current_month <- analysis_start_month

To adapt the code:

Do not change anything in the code above

Step 4.3: Generate rolling windows

The code below creates all possible 4-month and 12-month time windows by systematically moving forward one month at a time through the dataset. For each starting month, it defines both a 4-month window (the potential peak period) and a 12-month window (the comparison period). The code handles year transitions automatically, ensuring that windows spanning December-January are calculated correctly.

R
Python

Show the code

for (i in 1:total_num_blocks) {
  # Define 4-month window
  start_4m_year <- current_year
  start_4m_month <- current_month
  end_4m_year <- current_year
  end_4m_month <- current_month + 3
  
  if (end_4m_month > 12) {
    end_4m_year <- end_4m_year + 1
    end_4m_month <- end_4m_month - 12
  }

  # Define 12-month window  
  start_12m_year <- current_year
  start_12m_month <- current_month
  end_12m_year <- current_year
  end_12m_month <- current_month + 11
  
  if (end_12m_month > 12) {
    end_12m_year <- end_12m_year + 1
    end_12m_month <- end_12m_month - 12
  }

  # Create readable date range label
  date_range <- paste0(
    month_names[start_4m_month], " ", start_4m_year, "-",
    month_names[end_4m_month], " ", end_4m_year
  )
  
  # Store block definition
  blocks <- rbind(blocks, data.frame(
    block_number = i,
    start_4m_year = start_4m_year,
    start_4m_month = start_4m_month,
    end_4m_year = end_4m_year,
    end_4m_month = end_4m_month,
    start_12m_year = start_12m_year,
    start_12m_month = start_12m_month,
    end_12m_year = end_12m_year,
    end_12m_month = end_12m_month,
    date_range = date_range
  ))

  # Advance to next month
  current_month <- current_month + 1
  if (current_month > 12) {
    current_month <- 1
    current_year <- current_year + 1
  }
}

To adapt the code:

Do not change anything in the code above

Step 5: Calculate seasonality for each block

This step applies the WHO 60% rule to determine which time blocks represent seasonal peaks. For each location and each time block, the code calculates how much rainfall fell during the 4-month window and compares it to the total rainfall during the corresponding 12-month period. If the 4-month total represents 60% or more of the 12-month total, that block is marked as seasonal. This calculation is repeated for every location and every time block, creating a comprehensive dataset that shows exactly when and where seasonal patterns occur.

Step 5.1 Initialize results storage

This code sets up the data structure for storing seasonality calculations and identifies the locations to be analyzed. The detailed_results data frame is initialized to hold results for every combination of location and time block, while the admin_groups variable lists all unique geographic units (such as districts or chiefdoms) to be processed individually in the analysis.

R
Python

# Initialize results dataframe
detailed_results <- data.frame()

# Get all locations to analyze
admin_groups <- unique(filtered_data$admin_group)

To adapt the code:

Do not change anything in the code above

Step 5.2 Calculate seasonality per location and block

This code performs the main seasonality calculation by looping through each location and time block to evaluate rainfall patterns. It converts year–month values into a single numeric index to simplify filtering, extracts rainfall data within both the 4-month and 12-month windows, and sums them to compute the proportion of annual rainfall that occurred during the 4-month period. It then checks whether this percentage meets or exceeds the 60 percent threshold for seasonality and records the outcome, along with the relevant location details, in a results table. The final output provides a complete record showing which time blocks at each location qualify as seasonal.

R
Python

Show the code

for (admin_unit in admin_groups) {
  # Get data for this location
  unit_data <- filtered_data |> dplyr::filter(admin_group == admin_unit)
  
  for (i in 1:nrow(blocks)) {
    block <- blocks[i, ]
    
    # Convert year-month to comparable format
    unit_data_ym <- unit_data[[year_column]] * 12 + unit_data$Month
    
    # Extract 4-month window data
    start_4m_ym <- block$start_4m_year * 12 + block$start_4m_month
    end_4m_ym <- block$end_4m_year * 12 + block$end_4m_month
    data_4m <- unit_data |> 
      dplyr::filter(unit_data_ym >= start_4m_ym & unit_data_ym <= end_4m_ym)
    total_4m <- sum(data_4m[[value_column]], na.rm = TRUE)

    # Extract 12-month window data
    start_12m_ym <- block$start_12m_year * 12 + block$start_12m_month
    end_12m_ym <- block$end_12m_year * 12 + block$end_12m_month
    data_12m <- unit_data |> 
      dplyr::filter(unit_data_ym >= start_12m_ym & unit_data_ym <= end_12m_ym)
    total_12m <- sum(data_12m[[value_column]], na.rm = TRUE)

    # Calculate seasonality percentage
    percent_seasonality <- ifelse(total_12m > 0, (total_4m / total_12m) * 100, 0)
    is_seasonal <- as.numeric(percent_seasonality >= seasonality_threshold)

    # Build result row
    result_row <- data.frame(
      Block = i,
      DateRange = block$date_range,
      Total_4M = total_4m,
      Total_12M = total_12m,
      Percent_Seasonality = round(percent_seasonality, 2),
      Seasonal = is_seasonal,
      stringsAsFactors = FALSE
    )

    # Add administrative unit columns
    if (length(admin_columns) > 1) {
      admin_parts <- strsplit(admin_unit, " \\| ")[[1]]
      for (j in seq_along(admin_columns)) {
        result_row[[admin_columns[j]]] <- 
          ifelse(j <= length(admin_parts), admin_parts[j], NA)
      }
    } else {
      result_row[[admin_columns[1]]] <- admin_unit
    }

    detailed_results <- rbind(detailed_results, result_row)
  }
}

To adapt the code:

Do not change anything in the code above

Step 5.3 Preview results

This code displays a preview of the detailed seasonality results and optionally saves the full dataset to an Excel file for further analysis. Each row in the preview represents one time block at one location, showing the block number, date range, rainfall totals for both the 4-month and 12-month periods, the calculated seasonality percentage, whether it meets the seasonal threshold (1 = yes, 0 = no), and the corresponding geographic unit. Uncommenting the specified line allows saving the complete results for detailed review and quality assurance.

R
Python

# Save detailed results (uncomment to use)
# writexl::write_xlsx(detailed_results, detailed_output)

# Display preview
knitr::kable(head(detailed_results), caption = "Preview of Block-Level Results")

Output

Preview of Block-Level Results
Block	DateRange	Total_4M	Total_12M	Percent_Seasonality	Seasonal	FIRST_DNAM	FIRST_CHIE
1	Jan 2015-Apr 2015	93.2413	1067.608	8.73	0	KAILAHUN	DEA
2	Feb 2015-May 2015	201.4312	1069.191	18.84	0	KAILAHUN	DEA
3	Mar 2015-Jun 2015	315.0543	1064.095	29.61	0	KAILAHUN	DEA
4	Apr 2015-Jul 2015	507.7793	1071.512	47.39	0	KAILAHUN	DEA
5	May 2015-Aug 2015	648.5728	1061.437	61.10	1	KAILAHUN	DEA
6	Jun 2015-Sep 2015	698.8333	1033.029	67.65	1	KAILAHUN	DEA

To adapt the code:

Do not change anything in the code above

Step 6: Generate yearly summary

This step aggregates the block-level results to determine whether each location showed seasonal behavior in each year of the analysis. A year is classified as seasonal if it contains at least one 4-month block that meets the 60% threshold. This year-by-year summary is essential for the final classification because a location is only considered “Seasonal overall” if it shows seasonal patterns consistently across all analyzed years. The code groups the detailed results by location and year, counts how many seasonal blocks occurred in each year, and determines whether each year qualifies as seasonal.

Step 6.1 Extract year from date ranges

This code extracts the starting year from each date range label (e.g., “Jan 2015–Apr 2015”) to enable year-by-year seasonality analysis. It parses each label by splitting at the dash, taking the first part, trimming whitespace, and extracting the last four characters representing the year. The extracted year is then stored in a new column called StartYear, which is used to group seasonal blocks by year in subsequent steps.

R
Python

# Extract start year from DateRange column
detailed_results$StartYear <- sapply(detailed_results$DateRange, function(x) {
  parts <- strsplit(x, "-")[[1]]
  first_part <- trimws(parts[1])
  as.numeric(substr(first_part, nchar(first_part) - 3, nchar(first_part)))
})

To adapt the code:

Do not change anything in the code above

Step 6.2 Summarize by location and year

This code groups the block-level results by location and year, then summarizes them to determine annual seasonality status. For each location-year combination, it counts how many of the twelve 4-month blocks met the seasonal threshold (SeasonalCount), records the total number of blocks tested (always 12), and creates a binary indicator showing whether at least one block was seasonal (1 = yes, 0 = no). This indicator applies the WHO rule that a year is considered seasonal if it has at least one seasonal peak, regardless of how many seasonal blocks it contains.

R
Python

# Count seasonal blocks per location per year
yearly_summary <- detailed_results |>
  dplyr::group_by(dplyr::across(dplyr::all_of(admin_columns)), StartYear) |>
  dplyr::summarise(
    Year = dplyr::first(StartYear),
    SeasonalCount = sum(Seasonal, na.rm = TRUE),
    total_blocks_in_year = 12,
    at_least_one_seasonal_block = as.numeric(SeasonalCount > 0),
    .groups = 'drop'
  )

To adapt the code:

Do not change anything in the code above

Step 6.3 Add readable year labels

This code adds descriptive year labels that illustrate typical seasonal blocks for easier interpretation and reporting. For each year, it generates a readable label showing example date ranges of the evaluated blocks (e.g., for 2019: “Jan 2019–Apr 2019, Dec 2019–Mar 2020”), highlighting that a year can have one seasonal peak regardless of how many seasonal blocks it contains. After creating these labels, the code reorders columns to prioritize key information and sorts the results by year and location for clear, organized review.

R
Python

# Add descriptive year period labels
yearly_summary <- yearly_summary |>
  dplyr::mutate(
    year_period = paste0(
      "(Jan ", Year, "-Apr ", Year, ", Dec ", Year, "-Mar ", Year + 1, ")"
    )
  ) |>
  dplyr::select(
    Year, 
    dplyr::all_of(admin_columns), 
    year_period, 
    total_blocks_in_year, 
    at_least_one_seasonal_block
  ) |>
  dplyr::arrange(Year, dplyr::across(dplyr::all_of(admin_columns)))

To adapt the code:

Do not change anything in the code above

Step 6.4 Preview yearly summary

The code below displays the last ten rows of the yearly summary to show recent years, and optionally saves the complete yearly summary to Excel.

R
Python

# Save yearly summary (uncomment to use)
# writexl::write_xlsx(yearly_summary, yearly_output)

# Display last 10 rows
knitr::kable(
  tail(yearly_summary, 10), 
  caption = "Yearly Seasonality Summary (Last 10 rows)"
)

Output

Yearly Seasonality Summary (Last 10 rows)
Year	FIRST_DNAM	FIRST_CHIE	year_period	total_blocks_in_year	at_least_one_seasonal_block
2022	WESTERN RURAL	WATERLOO RURAL	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	1
2022	WESTERN RURAL	YORK RURAL	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	1
2022	WESTERN URBAN	CENTRAL I	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	1
2022	WESTERN URBAN	CENTRAL II	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	0
2022	WESTERN URBAN	EAST I	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	0
2022	WESTERN URBAN	EAST II	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	0
2022	WESTERN URBAN	EAST III	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	1
2022	WESTERN URBAN	WEST I	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	0
2022	WESTERN URBAN	WEST II	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	0
2022	WESTERN URBAN	WEST III	(Jan 2022-Apr 2022, Dec 2022-Mar 2023)	12	1

To adapt the code:

Do not change anything in the code above

Step 7: Location-level seasonality classification

This step produces the final classification by determining whether each location shows consistent seasonal patterns across all analyzed years. According to WHO guidance, a location is classified as “Seasonal” only if every year in the analysis contains at least one seasonal block. If even a single year fails to show a seasonal pattern, the location is classified as “Not Seasonal” because the inconsistency makes it unsuitable for time-bound interventions like SMC. The code aggregates the yearly results, counts how many years were seasonal at each location, and applies the classification rule.

Step 7.1 Aggregate across years

The code below groups the yearly summary by location and counts how many years showed seasonal patterns at each location, along with the total number of years analyzed.

R
Python

# Count how many years each location was seasonal
location_summary <- yearly_summary |>
  dplyr::group_by(dplyr::across(dplyr::all_of(admin_columns))) |>
  dplyr::summarise(
    SeasonalYears = sum(at_least_one_seasonal_block, na.rm = TRUE),
    TotalYears = dplyr::n(),
    .groups = 'drop'
  )

To adapt the code:

Do not change anything in the code above

Step 7.2 Apply classification rule

The code below implements the WHO classification rule: a location is “Seasonal” if and only if every analyzed year showed at least one seasonal block. If any year failed to show seasonality, the location is classified as “Not Seasonal”.

R
Python

# Classify: Seasonal = all years seasonal, otherwise Not Seasonal
location_summary <- location_summary |>
  dplyr::mutate(
    Seasonality = ifelse(
      SeasonalYears == TotalYears, 
      "Seasonal", 
      "Not Seasonal"
    )
  ) |>
  dplyr::arrange(dplyr::across(dplyr::all_of(admin_columns)))

To adapt the code:

Lines 5-8: Modify classification logic if you want different criteria

Step 7.3 Preview and save classification

The code below displays the first ten locations from the final classification and optionally saves the complete classification to Excel for program planning.

R
Python

# Save location classification (uncomment to use)
# writexl::write_xlsx(location_summary, location_output)

# Display results
knitr::kable(
  head(location_summary, 10), 
  caption = "Location Seasonality Classification"
)

Output

Location Seasonality Classification
FIRST_DNAM	FIRST_CHIE	SeasonalYears	TotalYears	Seasonality
BO	BADJIA	8	8	Seasonal
BO	BAGBO	8	8	Seasonal
BO	BAGBWE(BAGBE)	8	8	Seasonal
BO	BO TOWN	8	8	Seasonal
BO	BOAMA	8	8	Seasonal
BO	BONGOR	8	8	Seasonal
BO	BUMPE NGAO	8	8	Seasonal
BO	GBO	8	8	Seasonal
BO	JAIAMA	8	8	Seasonal
BO	KAKUA	8	8	Seasonal

To adapt the code:

Line 3: Change the file path when sacing the file

The output table shows each location with its seasonal year count, total years analyzed, and final classification. For example, a location showing “SeasonalYears = 8, TotalYears = 8, Seasonality = Seasonal” had seasonal patterns in all eight analyzed years and is therefore eligible for seasonal interventions. Uncommenting line 2 saves this classification to Excel for sharing result with the SNT team.

Step 8: Merge with spatial data for visualization

This step prepares the seasonality results for mapping by joining them with geographic boundary data. Maps are essential for communicating seasonality patterns to decision-makers and identifying spatial clusters of seasonal or non-seasonal areas. The code loads a shapefile containing the geographic boundaries of administrative units, reads the seasonality classification results, and merges them based on matching administrative unit names. This creates a spatial dataset where each geographic boundary is tagged with its seasonality classification, enabling the creation of informative maps in the following steps.

R
Python

# Load spatial data
shapefile_path <- here::here("english/data_r/shapefiles", "Chiefdom2021.shp")
spatial_data <- sf::st_read(shapefile_path)

Reading layer `Chiefdom2021' from data source 
  `C:\Users\User\snt-code-library\english\data_r\shapefiles\Chiefdom2021.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 208 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -13.30736 ymin: 6.923379 xmax: -10.27056 ymax: 10.00989
CRS:           NA

# Load seasonality results if not in environment
seasonality_path <- here::here(
  "english/library/stratification/other", 
  "final_SMC_eligibility.xlsx"
)
category_data <- readxl::read_excel(seasonality_path)

# Merge spatial and seasonality data
merged <- spatial_data |>
  dplyr::left_join(category_data, by = c("FIRST_DNAM", "FIRST_CHIE"))

To adapt the code:

Line 3: Update shapefile path to your administrative boundary file
Lines 6-9: Update path to your seasonality results file
Line 13: Adjust join columns to match your admin unit names in both datasets

Step 9: Map years with seasonal peaks

This step creates a map showing how many years each location exhibited seasonal rainfall patterns. This visualization is useful for understanding the strength and consistency of seasonal signals across different areas. Locations that were seasonal in many years have strong, reliable patterns suitable for intervention planning, while locations with fewer seasonal years may have more variable rainfall patterns. The code prepares the data for mapping, defines a color scheme where different numbers of seasonal years receive different colors, and creates a map with a legend showing how many locations fall into each category.

R
Python

Code

# Prepare data for mapping
merged$category <- as.character(merged$SeasonalYears)
merged$category[is.na(merged$category)] <- "0"

# Define color palette
category_colors <- c(
  "0" = "gray80",   # No seasonal years
  "1" = "#4E79A7",  "2" = "#F28E2B", "3" = "#E15759",
  "4" = "#76B7B2",  "5" = "#59A14F", "6" = "#EDC948",
  "7" = "#4E79A7",  "8" = "#AF7AA1"
)

# Convert to factor with proper ordering
merged$category <- factor(merged$category, levels = as.character(0:8))

# Count locations in each category
cat_counts <- table(merged$category)
cat_counts <- cat_counts[cat_counts > 0]

# Create legend labels with counts
legend_labels <- paste0(names(cat_counts), " (", cat_counts, ")")
names(legend_labels) <- names(cat_counts)

# Filter colors to categories that exist
category_colors_filtered <- category_colors[names(category_colors) %in% names(cat_counts)]

# Create map
p <- ggplot(merged) +
  geom_sf(aes(fill = category), color = "white", size = 0.2) +
  scale_fill_manual(
    values = category_colors_filtered,
    labels = legend_labels,
    name = "Years with\nSeasonal Peak"
  ) +
  theme_void() +
  theme(
    legend.position = "right",
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
  ) +
  labs(title = "Number of Years with Seasonal Rainfall Patterns")

print(p)

Output

To adapt the code:

Lines 11-15: Modify color palette if desired
Line 43: Change map title as needed

The map above shows how many years each chiefdom in Sierra Leone experienced seasonal rainfall patterns based on the 60 percent threshold. Areas in deep purple had seasonal peaks in all eight years, indicating strong and consistent seasonality suitable for SMC implementation. Lighter colors such as blue, yellow, and red represent locations with fewer seasonal years, showing more variable rainfall patterns. The legend summarizes how many chiefdoms fall into each category, helping identify areas with stable versus inconsistent seasonality for program planning.

Step 10: Map final seasonality classification

This step generates the final seasonality classification map, distinguishing areas as Seasonal or Not Seasonal based on the consistency of rainfall patterns. The process involves preparing the dataset, defining a two-color scheme (blue for Seasonal and light pink for Not Seasonal), and creating district boundaries by dissolving chiefdom polygons for clearer geographic reference. The seasonality variable is converted to an ordered factor for proper legend arrangement, and the number of locations in each category is counted to label the legend (e.g., “Seasonal (89)”).

R
Python

Code

# Handle missing values
merged$Seasonality[is.na(merged$Seasonality)] <- "No data"

# Define colors
colors <- c(
  "Seasonal" = "#47B5FF",
  "Not Seasonal" = "#FFB3BA",
  "No data" = "white"
)

# Create district boundaries for overlay
boundaries <- merged |>
  dplyr::group_by(FIRST_DNAM) |>
  dplyr::summarise(geometry = sf::st_union(geometry))

# Convert to factor
merged$Seasonality <- factor(
  merged$Seasonality, 
  levels = c("Seasonal", "Not Seasonal", "No data")
)

# Count categories
cat_counts <- table(merged$Seasonality)
cat_counts <- cat_counts[cat_counts > 0]

# Create legend labels with counts
legend_labels <- paste0(names(cat_counts), " (", cat_counts, ")")
names(legend_labels) <- names(cat_counts)

# Filter colors
colors_filtered <- colors[names(colors) %in% names(cat_counts)]

# Create final map
p <- ggplot() +
  geom_sf(data = merged, aes(fill = Seasonality), 
          color = "gray", linewidth = 0.3) +
  geom_sf(data = boundaries, fill = NA, 
          color = "black", linewidth = 0.8) +
  scale_fill_manual(
    values = colors_filtered,
    labels = legend_labels,
    name = "Classification"
  ) +
  labs(title = "Malaria Seasonality Classification") +
  theme_void() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    legend.position = "right",
    legend.background = element_rect(fill = "white", color = "black")
  )

print(p)

Output

To adapt the code:

Lines 6-9: Modify colors for seasonal and non-seasonal categories to match your organization’s style
Lines 13-15: Change grouping variable (FIRST_DNAM) if you want boundaries at a different admin level
Line 44: Update map title as needed

The map above shows how chiefdoms in Sierra Leone were classified based on rainfall consistency. Areas in blue are Seasonal, meaning they consistently had at least 60 percent of annual rainfall within a four month period across all analyzed years. These 179 chiefdoms have stable and predictable rainfall patterns that make them suitable for time bound malaria interventions such as Seasonal Malaria Chemoprevention. Areas in pink are Not Seasonal, representing 29 chiefdoms with irregular or inconsistent rainfall patterns. Most parts of Sierra Leone are classified as Seasonal, while a few areas in the east and south show variability that may require further review or different intervention timing.

Summary

This section explains how to identify and classify malaria seasonality using rainfall data based on the WHO four-month rule. It applies a rolling window approach to detect periods when at least 60 percent of annual rainfall occurs within any four consecutive months and determines whether this pattern is consistent across multiple years. The analysis produces yearly and location-level classifications of “Seasonal” and “Not Seasonal,” supported by clear maps and summaries. These outputs help programs align intervention timing, such as SMC and IRS campaigns, with predictable rainfall and transmission cycles.

Full Code

Find the full code for seasonality stratification below.

R
Python

# ============================
# Step 1: Import packages and data
# ============================
if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")

pacman::p_load(
  readxl, dplyr, openxlsx, lubridate, ggplot2, readr, stringr, 
  here, tidyr, gridExtra, knitr, writexl, sf
)

# File path
file_path <- here::here("english/data_r/modeled", "chirps_data_2015_2023_lastest.xls")

# Load your data
data <- readxl::read_excel(file_path)


# ============================
# Step 2: Configure analysis parameters
# ============================
year_column <- "Year"
month_column <- "Month"
value_column <- "mean_rain"
admin_columns <- c("FIRST_DNAM", "FIRST_CHIE")

analysis_start_year <- 2015
analysis_start_month <- 1
seasonality_threshold <- 60

#detailed_output <- "detailed_seasonality_results.xlsx"
#yearly_output <- "yearly_analysis_summary.xlsx"
#location_output <- "location_seasonality_summary.xlsx"


# ============================
# Step 3: Prepare data for analysis
# ============================
required_cols <- c(year_column, month_column, value_column, admin_columns)
missing_cols <- required_cols[!required_cols %in% colnames(data)]
if (length(missing_cols) > 0) {
  stop(paste("Missing columns:", paste(missing_cols, collapse = ", ")))
}

filtered_data <- data |>
  dplyr::filter(
    !is.na(!!sym(year_column)) & 
    !is.na(!!sym(month_column)) & 
    !!sym(year_column) >= analysis_start_year
  ) |>
  dplyr::mutate(Month = as.numeric(!!sym(month_column)))

if (length(admin_columns) == 1) {
  filtered_data$admin_group <- filtered_data[[admin_columns[1]]]
} else {
  filtered_data <- filtered_data |>
    dplyr::mutate(
      admin_group = paste(!!!syms(admin_columns), sep = " | ")
    )
}

available_years <- sort(unique(filtered_data[[year_column]]))
data_span_years <- length(available_years)

if (data_span_years < 6) {
  stop(paste(
    "Insufficient data: Analysis requires at least 6 years of data, but only", 
    data_span_years, "years found."
  ))
}

num_complete_years <- data_span_years - 1
total_num_blocks <- num_complete_years * 12


# ============================
# Step 4: Generate rolling time blocks
# ============================
month_names <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                 "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

blocks <- data.frame()
current_year <- analysis_start_year
current_month <- analysis_start_month

for (i in 1:total_num_blocks) {
  # 4-month period
  start_4m_year <- current_year
  start_4m_month <- current_month
  end_4m_year <- current_year
  end_4m_month <- current_month + 3
  
  if (end_4m_month > 12) {
    end_4m_year <- end_4m_year + 1
    end_4m_month <- end_4m_month - 12
  }

  # 12-month period  
  start_12m_year <- current_year
  start_12m_month <- current_month
  end_12m_year <- current_year
  end_12m_month <- current_month + 11
  
  if (end_12m_month > 12) {
    end_12m_year <- end_12m_year + 1
    end_12m_month <- end_12m_month - 12
  }

  # Create date range label
  date_range <- paste0(
    month_names[start_4m_month], " ", start_4m_year, "-",
    month_names[end_4m_month], " ", end_4m_year
  )
  
  # Store block information
  blocks <- rbind(blocks, data.frame(
    block_number = i,
    start_4m_year = start_4m_year,
    start_4m_month = start_4m_month,
    end_4m_year = end_4m_year,
    end_4m_month = end_4m_month,
    start_12m_year = start_12m_year,
    start_12m_month = start_12m_month,
    end_12m_year = end_12m_year,
    end_12m_month = end_12m_month,
    date_range = date_range
  ))

  # Move to next month
  current_month <- current_month + 1
  if (current_month > 12) {
    current_month <- 1
    current_year <- current_year + 1
  }
}


# ============================
# Step 5: Calculate seasonality for each block
# ============================
detailed_results <- data.frame()
admin_groups <- unique(filtered_data$admin_group)

for (admin_unit in admin_groups) {
  unit_data <- filtered_data |> dplyr::filter(admin_group == admin_unit)
  
  for (i in 1:nrow(blocks)) {
    block <- blocks[i, ]
    unit_data_ym <- unit_data[[year_column]] * 12 + unit_data$Month
    
    # 4-month window
    start_4m_ym <- block$start_4m_year * 12 + block$start_4m_month
    end_4m_ym <- block$end_4m_year * 12 + block$end_4m_month
    data_4m <- unit_data |> dplyr::filter(unit_data_ym >= start_4m_ym & unit_data_ym <= end_4m_ym)
    total_4m <- sum(data_4m[[value_column]], na.rm = TRUE)

    # 12-month window
    start_12m_ym <- block$start_12m_year * 12 + block$start_12m_month
    end_12m_ym <- block$end_12m_year * 12 + block$end_12m_month
    data_12m <- unit_data |> dplyr::filter(unit_data_ym >= start_12m_ym & unit_data_ym <= end_12m_ym)
    total_12m <- sum(data_12m[[value_column]], na.rm = TRUE)

    # Percentage & flag
    percent_seasonality <- ifelse(total_12m > 0, (total_4m / total_12m) * 100, 0)
    is_seasonal <- as.numeric(percent_seasonality >= seasonality_threshold)

    # Build result row
    result_row <- data.frame(
      Block = i, DateRange = block$date_range,
      Total_4M = total_4m, Total_12M = total_12m,
      Percent_Seasonality = round(percent_seasonality, 2),
      Seasonal = is_seasonal,
      stringsAsFactors = FALSE
    )

    if (length(admin_columns) > 1) {
      admin_parts <- strsplit(admin_unit, " \\| ")[[1]]
      for (j in seq_along(admin_columns)) {
        result_row[[admin_columns[j]]] <- ifelse(j <= length(admin_parts), admin_parts[j], NA)
      }
    } else {
      result_row[[admin_columns[1]]] <- admin_unit
    }

    detailed_results <- rbind(detailed_results, result_row)
  }
}

# Preview results
knitr::kable(head(detailed_results), caption = "Preview of Detailed Block Results")


# ============================
# Step 6: Generate yearly summary
# ============================
detailed_results$StartYear <- sapply(detailed_results$DateRange, function(x) {
  parts <- strsplit(x, "-")[[1]]
  first_part <- trimws(parts[1])
  as.numeric(substr(first_part, nchar(first_part) - 3, nchar(first_part)))
})

yearly_summary <- detailed_results |>
  dplyr::group_by(dplyr::across(dplyr::all_of(admin_columns)), StartYear) |>
  dplyr::summarise(
    Year = dplyr::first(StartYear),
    SeasonalCount = sum(Seasonal, na.rm = TRUE),
    total_blocks_in_year = 12,
    at_least_one_seasonal_block = as.numeric(SeasonalCount > 0),
    .groups = 'drop'
  ) |>
  dplyr::mutate(
    year_period = paste0(
      "(Jan ", Year, "-Apr ", Year, ", Dec ", Year, "-Mar ", Year + 1, ")"
    )
  ) |>
  dplyr::select(
    Year, dplyr::all_of(admin_columns),
    year_period, total_blocks_in_year, at_least_one_seasonal_block
  ) |>
  dplyr::arrange(Year, dplyr::across(dplyr::all_of(admin_columns)))

# Preview summary
knitr::kable(tail(yearly_summary, 10), caption = "Yearly Seasonality Summary (Last 10 rows)")


# ============================
# Step 7: Location-level seasonality classification
# ============================
location_summary <- yearly_summary |>
  dplyr::group_by(dplyr::across(dplyr::all_of(admin_columns))) |>
  dplyr::summarise(
    SeasonalYears = sum(at_least_one_seasonal_block, na.rm = TRUE),
    TotalYears = dplyr::n(),
    .groups = 'drop'
  ) |>
  dplyr::mutate(
    Seasonality = ifelse(SeasonalYears == TotalYears, "Seasonal", "Not Seasonal")
  ) |>
  dplyr::arrange(dplyr::across(dplyr::all_of(admin_columns)))

# Preview classification
knitr::kable(head(location_summary, 10), caption = "Location Seasonality Classification")


# ============================
# Step 8: Merge data for visualization
# ============================
file_path <- here::here("english/data_r/shapefiles", "Chiefdom2021.shp")
file_path_2 <- here::here("english/library/stratification/other", "final_SMC_eligibility.xlsx")

spatial_data <- sf::st_read(file_path)
category_data <- readxl::read_excel(file_path_2)

merged <- spatial_data |>
  dplyr::left_join(category_data, by = c("FIRST_DNAM", "FIRST_CHIE"))


# ============================
# Step 9: Map for years of seasonal peak
# ============================
merged$category <- as.character(merged$SeasonalYears)
merged$category[is.na(merged$category)] <- "0"

category_colors <- c(
  "0" = "gray80",
  "1" = "#4E79A7", "2" = "#F28E2B", "3" = "#E15759",
  "4" = "#76B7B2", "5" = "#59A14F", "6" = "#EDC948",
  "7" = "#4E79A7", "8" = "#AF7AA1"
)

merged$category <- factor(merged$category, levels = as.character(0:8))

cat_counts <- table(merged$category)
cat_counts <- cat_counts[cat_counts > 0]

legend_labels <- paste0(names(cat_counts), " (", cat_counts, ")")
names(legend_labels) <- names(cat_counts)

category_colors_filtered <- category_colors[names(category_colors) %in% names(cat_counts)]

p <- ggplot(merged) +
  geom_sf(aes(fill = category), color = "white", size = 0.2) +
  scale_fill_manual(values = category_colors_filtered, labels = legend_labels, name = "Category") +
  theme_minimal() + theme_void() +
  theme(
    legend.position = "right",
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    panel.grid = element_blank()
  ) +
  labs(title = "Map by Category (0-8)")

print(p)

### Seasonality plot

# Replace NA with "No data" if needed
merged$Seasonality[base::is.na(merged$Seasonality)] <- "No data"

# Assign colors
colors <- c(
  "Seasonal" = "#47B5FF",      # Blue
  "Not Seasonal" = "#FFB3BA",  # Light pink
  "No data" = "white"
)

# Boundaries for FIRST_DNAM 
boundaries <- merged |>
  dplyr::group_by(FIRST_DNAM) |>
  dplyr::summarise(geometry = sf::st_union(geometry))

# Convert to factor
merged$Seasonality <- base::factor(
  merged$Seasonality, 
  levels = c("Seasonal", "Not Seasonal", "No data")
)

# Count categories (excluding zero counts) 
cat_counts <- base::table(merged$Seasonality)
cat_counts <- cat_counts[cat_counts > 0]

# Create labels with counts
legend_labels <- base::paste0(base::names(cat_counts), " (", cat_counts, ")")
base::names(legend_labels) <- base::names(cat_counts)

# Filter colors to only include categories that exist
colors_filtered <- colors[base::names(colors) %in% base::names(cat_counts)]

# Plot
p <- ggplot2::ggplot() +
  ggplot2::geom_sf(data = merged, ggplot2::aes(fill = Seasonality), 
                   color = "gray", linewidth = 0.3) +
  ggplot2::geom_sf(data = boundaries, fill = NA, 
                   color = "black", linewidth = 0.8) +
  ggplot2::scale_fill_manual(
    values = colors_filtered,
    labels = legend_labels,
    name = "Seasonality"
  ) +
  ggplot2::labs(title = "Malaria Seasonality Classification") +
  ggplot2::theme_void() +
  ggplot2::theme(
    plot.title = ggplot2::element_text(hjust = 0.5, size = 14, face = "bold"),
    legend.position = "right",
    legend.background = ggplot2::element_rect(fill = "white", color = "black"),
    legend.key = ggplot2::element_rect(color = "black")
  )

# Display plot
base::print(p)