Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 2 Assemblage et gestion des données
  2. 2.3 Données de cas de routine (DHIS2)
  3. Final database
  • Bibliothèque de code pour l'adaptation infranationale
    Version française
  • 1 Pour commencer
    • 1.1 À propos et comment nous contacter
    • 1.2 Pour tous
    • 1.3 Pour l’équipe SNT
    • 1.4 Pour les analystes
    • 1.5 Produire des résultats de haute qualité
  • 2 Assemblage et gestion des données
    • 2.1 Utilisation des shapefiles
      • Aperçu des données spatiales
      • Examiner les données du fichier shapefile
      • Shapefile management and customization
      • Merge shapefile with excel
    • 2.2 Formations sanitaires
      • Health facility active/inactive status
      • Health facility coordinates
      • Master facility lists
    • 2.3 Données de cas de routine (DHIS2)
      • Health facility reporting rate
      • Outlier detection methods
      • Imputation of missing data
      • Final database
      • Data extraction from DHIS2
      • Import dataset
      • Outlier correction
      • Quality control/checks
    • 2.4 Données du stock
      • lmis
    • 2.5 Données démographiques
      • Données démographiques nationales
      • Raster de population WorldPop
    • 2.6 Enquêtes nationales auprès des ménages
      • DHS Data Overview and Preparation
      • All-Cause Child Mortality
      • Extraction of ITN ownership, access, and usage
      • Extracion of prevalence data
      • Calculation of treatment-seeking data
    • 2.7 Données entomologiques
    • 2.8 Données climatiques et environnementales
      • Extraction de données climatiques et environnementales à partir de données raster
    • 2.9 Données modélisées
      • Generating spatial modeled estimates
      • Travailler avec les estimations modélisées géospatiales
      • Modeled Estimates of Entomological Indicators
      • Mortality estimates from IHME
  • 3 Stratification
    • 3.1 Stratification épidémiologique
    • 3.2 Stratification des déterminants de la transmission du paludisme
  • 4 Revue des interventions passées
    • 4.1 Prise en charge des cas
    • 4.2 Interventions de routine
    • 4.3 Interventions de campagne
    • 4.4 Autres interventions
  • 5 Ciblage des interventions
  • 6 Analyse rétrospective
  • 7 Microstratification urbaine

On this page

  • Full script
  1. 2 Assemblage et gestion des données
  2. 2.3 Données de cas de routine (DHIS2)
  3. Final database

Final database

Lorem ipsum dolor sit amet, tempus dolor pharetra ac non sed dis eget non egestas. Luctus, amet sed egestas lectus nec lacus. In pulvinar leo. Sit, hendrerit duis turpis ut velit habitasse in. Metus, hac vulputate nascetur lacus non id amet enim ullamcorper lacinia efficitur tortor. Ligula metus nulla nec conubia ut tincidunt, eget curae sit nam sed. Mauris diam diam nibh suspendisse tempus, venenatis. Pellentesque nisl aliquam sed ac, non. Donec penatibus, varius tellus praesent aenean phasellus, imperdiet pretium, tincidunt fringilla posuere. Vitae tincidunt curae id turpis lacinia imperdiet. Netus rhoncus. Ligula vulputate dictum sed ut maximus. Mi vulputate velit. Mi tortor arcu est mi dis elementum et.

Full script

  • R
  • Python
# Install and load required packages
if (!require("readxl")) install.packages("readxl")
if (!require("dplyr")) install.packages("dplyr")
if (!require("writexl")) install.packages("writexl")

library(readxl)
library(dplyr)
library(writexl)

merge_all_results <- function() {
  # Define the merge keys
  merge_keys <- c("adm1", "adm2", "adm3", "hf", "hf_uid", "year", "month")

  # Create a list of file URLs
  file_urls <- c(
    "https://raw.githubusercontent.com/ahadi-analytics/snt-data-files/58a0d28bec84fa4c00b69fd3edc680b5318ce372/Excel%20files/allout_results.xlsx",
    "https://raw.githubusercontent.com/ahadi-analytics/snt-data-files/58a0d28bec84fa4c00b69fd3edc680b5318ce372/Excel%20files/susp_results.xlsx",
    "https://raw.githubusercontent.com/ahadi-analytics/snt-data-files/58a0d28bec84fa4c00b69fd3edc680b5318ce372/Excel%20files/test_results.xlsx",
    "https://raw.githubusercontent.com/ahadi-analytics/snt-data-files/58a0d28bec84fa4c00b69fd3edc680b5318ce372/Excel%20files/conf_results.xlsx",
    "https://raw.githubusercontent.com/ahadi-analytics/snt-data-files/58a0d28bec84fa4c00b69fd3edc680b5318ce372/Excel%20files/maltreat_results.xlsx",
    "https://raw.githubusercontent.com/ahadi-analytics/snt-data-files/58a0d28bec84fa4c00b69fd3edc680b5318ce372/Excel%20files/pres_results.xlsx",
    "https://raw.githubusercontent.com/ahadi-analytics/snt-data-files/58a0d28bec84fa4c00b69fd3edc680b5318ce372/Excel%20files/maladm_results.xlsx",
    "https://raw.githubusercontent.com/ahadi-analytics/snt-data-files/58a0d28bec84fa4c00b69fd3edc680b5318ce372/Excel%20files/maldth_results.xlsx"
  )

  # Initialize list to store dataframes
  df_list <- list()

  # Load each Excel file into separate DataFrames
  for (i in seq_along(file_urls)) {
    tryCatch({
      temp <- tempfile(fileext = ".xlsx")
      download.file(url = file_urls[i], destfile = temp, mode = "wb")
      df_list[[i]] <- read_excel(temp)
      unlink(temp)
      cat(sprintf("Loaded file %d successfully.\n", i))
    }, error = function(e) {
      cat(sprintf("Error loading file %d: %s\n", i, e$message))
    })
  }

  # Check if any dataframes were loaded
  if (length(df_list) == 0) {
    stop("No files were loaded successfully")
  }

  # Start with the first dataframe
  merged_df <- df_list[[1]]

  # Merge remaining dataframes
  if (length(df_list) > 1) {
    for (i in 2:length(df_list)) {
      merged_df <- full_join(merged_df, df_list[[i]], by = merge_keys)
      cat(sprintf("Merged file %d into the dataset.\n", i))
    }
  }

  # Sort by the merge keys
  merged_df <- merged_df |> arrange(across(all_of(merge_keys)))

  # Save the merged DataFrame to an Excel file
  write_xlsx(merged_df, "clean_routine_data.xlsx")
  cat("Merged data saved to 'clean_routine_data.xlsx'\n")

  return(merged_df)
}

# Perform the merge
merged_df <- merge_all_results()
df <- merged_df

head(df, 13)



# Function to save selected columns to a new Excel file
save_selected_columns_to_excel <- function(input_file, selected_columns, output_file) {
  tryCatch({
    # Read the complete dataset
    df <- read_excel(input_file)

    # Select only the specified columns that exist in the dataset
    existing_columns <- intersect(selected_columns, colnames(df))
    if (length(existing_columns) == 0) {
      stop("None of the selected columns exist in the dataset.")
    }

    # Create a DataFrame with only the selected columns
    selected_df <- df |> select(all_of(existing_columns))

    # Save to Excel
    write_xlsx(selected_df, output_file)
    cat(sprintf("Selected columns saved to '%s' successfully!\n", output_file))
    cat("Columns saved: ", paste(existing_columns, collapse = ", "), "\n")
  }, error = function(e) {
    cat("Error:", e$message, "\n")
  })
}

# Function to rename columns in an Excel file and save to a new file
rename_columns_in_excel <- function(file_path, rename_mapping, output_file) {
  tryCatch({
    # Read the dataset from the existing file
    df <- read_excel(file_path)

    # Rename the columns
    colnames(df) <- dplyr::recode(colnames(df), !!!rename_mapping)

    # Save the renamed DataFrame to a new Excel file
    write_xlsx(df, output_file)
    cat(sprintf("Renamed columns saved to '%s' successfully!\n", output_file))
    cat("Columns renamed: ", paste(names(rename_mapping), "->", rename_mapping, collapse = ", "), "\n")
  }, error = function(e) {
    cat("Error:", e$message, "\n")
  })
}

# Specify input file, selected columns, and intermediate file
input_file <- "clean_routine_data.xlsx"
selected_columns <- c(
  "adm1", "adm2", "adm3", "hf", "hfuid", "year", "month",
  "allout_corrected_winsorised", "susp_corrected_winsorised", "test_corrected_winsorised",
  "conf_corrected_winsorised", "maltreat_corrected_winsorised", "pres_corrected_winsorised",
  "maladm_corrected_winsorised", "maldth_corrected_winsorised"
)
intermediate_file <- "intermediate_selected_data.xlsx"

# Save selected columns to an intermediate file
save_selected_columns_to_excel(input_file, selected_columns, intermediate_file)

# Specify rename mapping and final output file
rename_mapping <- c(
  "allout_corrected_winsorised" = "allout",
  "susp_corrected_winsorised" = "susp",
  "test_corrected_winsorised" = "test",
  "conf_corrected_winsorised" = "conf",
  "maltreat_corrected_winsorised" = "maltreat",
  "pres_corrected_winsorised" = "pres",
  "maladm_corrected_winsorised" = "maladm",
  "maldth_corrected_winsorised" = "maldth"
)

final_output_file <- "clean_malaria_routine_data_final.xlsx"

# Rename columns and save to the final file
rename_columns_in_excel(intermediate_file, rename_mapping, final_output_file)

Clean database

 

©2025 Applied Health Analytics for Delivery and Innovation. All rights reserved