Dev Site — You are viewing the development build. Go to Main Site

  • English
  • Français
  1. 2. Data Assembly and Management
  2. 2.3 Routine Surveillance Data
  3. Routine data extraction
  • Code library for subnational tailoring
    English version
  • 1. Getting Started
    • 1.1 About and Contact Information
    • 1.2 For Everyone
    • 1.3 For the SNT Team
    • 1.4 For Analysts
    • 1.5 Acronyms and Resource Library
    • 1.6 Producing High-Quality Outputs
  • 2. Data Assembly and Management
    • 2.1 Working with Shapefiles
      • Spatial data overview
      • Basic shapefile use and visualization
      • Shapefile management and customization
      • Merging shapefiles with tabular data
    • 2.2 Health Facilities Data
      • Fuzzy matching of names across datasets
      • Health facility coordinates and point data
    • 2.3 Routine Surveillance Data
      • Determining active and inactive status
      • Routine data extraction
      • DHIS2 data preprocessing
      • Missing data detection methods
      • Health facility reporting rate
      • Contextual considerations
      • Data coherency checks
      • Outlier detection methods
      • Imputation methods
      • Final database
    • 2.4 Stock Data
      • LMIS
    • 2.5 Population Data
      • National population data
      • WorldPop population raster
    • 2.6 National Household Survey Data
      • DHS data overview and preparation
      • Prevalence of malaria infection
      • All-cause child mortality
      • Treatment-seeking rates
      • ITN ownership, access, and usage
      • Wealth quintiles analysis
    • 2.7 Entomological Data
      • Entomological data
    • 2.8 Climate and Environmental Data
      • Climate and environment data extraction from raster
    • 2.9 Modeled Data
      • Generating spatial modeled estimates
      • Working with geospatial model estimates
      • Modeled estimates of malaria mortality and proxies
      • Modeled estimates of entomological indicators
    • 2.10 Cost Data
  • 3. Situation Analysis
    • 3.1 Review of Past Interventions
      • Case Management
      • Routine Interventions
      • Mass ITN Campaigns
      • Chemoprevention Campaigns
      • Other Vector Control
    • 3.2 Trend Analysis
    • 3.3 Risk Factors
    • 3.4 Impact Evaluation
    • 3.5 Cost Analysis
  • 4. Stratification
    • 4.1 Epidemiological Stratification
      • Incidence overview and crude incidence
      • Incidence adjustment 1: incomplete testing
      • Incidence adjustment 2: incomplete reporting
      • Incidence adjustment 3: treatment-seeking
      • Incidence stratification
      • Prevalence and mortality stratification
      • Combined risk categorization
      • Risk categorization REMOVE?
      • Risk categorization REMOVE?
    • 4.2 Access to Care
    • 4.3 Seasonality
      • Defining Seasonal Areas
      • Durations of Seasonality
    • 4.4 Urban Microstratification
  • 5. Intervention Targeting and Prioritization
    • 5.1 Intervention Targeting
    • 5.2 Prioritization
    • 5.3 Optimization under Limited Resources

On this page

  • Overview
  • Basic Data Elements
  • Epidemiological Data Elements
  • Intervention Data Elements
    • Malaria Interventions
    • Data Elements to Inform Coverage of Routine Interventions
  • Stock Data Elements
  • Quality Checks on the Country’s Submission
    • Coverage of Years
      • All required years present
      • All months or quarters within each year
      • Reporting frequency consistent across years
      • Partial-year coverage flagged
    • Coverage of Indicators
      • All requested indicators present
      • No duplicate indicators under different names
      • Indicator renames documented
      • Definitions consistent across years
      • New indicators clearly documented
      • Discontinued indicators flagged
      • Aggregation symmetric across age and setting
    • Dataset Structure
      • Tabular structure with rows as reporting units
      • Consistent structure across tabs and files
      • Wide or long format clearly identified
      • No merged cells, subtotals, or formatting artifacts
      • Geographic levels labeled clearly
      • Year-separated or concatenated files documented
    • Integrity of Data
      • File opens without errors
      • Workable file format
      • No password protection or locked cells
      • Correct character encoding
    • Harmonization with Shapefiles and the MFL
      • Admin levels align with the shapefile
      • MFL available as a crosswalk
      • MFL current for the reporting period
      • All facility names matched
      • Admin unit names consistent across datasets
      • Coordinates valid and inside admin boundaries
  1. 2. Data Assembly and Management
  2. 2.3 Routine Surveillance Data
  3. Routine data extraction

Routine data extraction

Overview

One of the key pillars of any SNT exercise is the review of the routine malaria data collected by the country’s national surveillance system. This section lists the most commonly collected data elements by countries’ Health Management Information Systems (HMIS) and considerations for its preparation and use.

In many countries, the HMIS uses the District Health Information System version 2 (DHIS2) platform to host the national database for routine health data. Furthermore, an increasing number of national malaria programs (NMPs) are establishing national malaria data repositories (NMDRs) linked to the HMIS so that routine data are easily accessible in a structured and organized fashion for data analysis.

The platform from which routine data should be extracted for SNT analysis (DHIS2, NMDR, or another platform) will be determined by the SNT team during the data collection and management step. We should identify a person with access to and knowledge of the platform as the focal person for downloading data.

During the SNT process, it will be key to establish clear data access mechanisms and ensure that the NMP and the Ministry of Health have full control of the individuals and partners accessing the data, as well as full access to all the information used.

Given that the data extraction process requires important governance discussions outside of the scope of this library, this page does not include instructions on how to perform the extraction. Instead, we list the key data elements that in our experience have been important or helpful for SNT analyses, such that they can be identified for extraction. Additional data elements may also be relevant for the SNT, as every country’s context is different. Disaggregated information is valuable, although countries disaggregate their data differently, such as by age or sex.

In the code library, we use the column names from the Sierra Leone DHIS2 to provide an example to demonstrate data disaggregation and naming in one country.

For the broader methodological framing, including why each indicator category matters, how it feeds the SNT decision process, and how data assembly sits within the wider 10-step SNT workflow, see WHO’s Manual for subnational tailoring of malaria interventions. In particular, Annex 2 — Proposed data checklist of that manual lists the data elements every SNT exercise is expected to assemble; this page operationalizes that checklist by translating each item into the column-level detail an extraction request needs. AHADI’s SNT Roadmap Template provides a complementary planning, timeline, and data-checklist tool for the full SNT process.

ImportantConsult with SNT team

Each country’s data is different!

Data element names, definitions, and disaggregations differ across countries. Although the conventions from the Sierra Leone DHIS2 are provided below, these are only examples, and the country will very likely use a somewhat different system.

Ask the SNT team for a data dictionary specific to the country’s data we are working with. This will help us better understand and analyze routine surveillance data. Expect to work closely with the NMP data manager to understand data element names, ensure that all necessary data fields are extracted, and that subsequent calculations are performed correctly.

Once the data manager delivers the extraction, work through the Quality checks on the country’s submission section at the end of this page before importing the data into the analysis workflow.

NoteObjectives
  • Understand which data elements from routine surveillance could be relevant for SNT
  • Gain general awareness of how elements may be disaggregated in routine reporting

Basic Data Elements

These core metadata columns provide the foundation for organizing and analyzing routine malaria surveillance data. While some elements may auto-populate from standard platform extractions, others often require explicit selection. Always verify which fields are included in extractions with the SNT team.

  1. Health Facility ID: unique identification number for each health facility

    If the health facility ID column was not extracted and temporary health facility IDs are needed for the analysis, the data preprocessing page provides an example of how to do that.

  2. Health Facility Name: name of the health facility reporting the data

    In Sierra Leone, this is found under the hf column of DHIS2.

  3. Health Facility Type: classification of the facility (e.g., Hospital, Health Center, Clinic)

    Some countries may include this information in DHIS2, otherwise it is often included in the master facility list (MFL).

  4. Administrative Level 0 to X: hierarchy of geographic units in the country ranging from largest (adm0, or national level) to smallest, to which each health facility is assigned

    In Sierra Leone, administrative units from 0 to 4 are found under the columns adm0, adm1, adm2, adm3, and adm4.

  5. Reporting Period: time frame for which data is reported

    In Sierra Leone, the reporting period is found under the periodname column which includes both month and year, formatted as “January 2023” for example.

TipWhere this data is used

The basic metadata columns are the keys that link the routine data to the spatial and master-list workflows elsewhere in the library:

  • Facility names and IDs feed Fuzzy matching of names across datasets, which links the routine extraction to the master facility list.
  • Facility coordinates (where DHIS2 also stores the org-unit longitude and latitude) are validated, cleaned, and mapped on Health facility coordinates and point data.
  • Admin levels (adm0–adm3 or higher) are harmonized with shapefiles on Shapefile management and customization and visualized on Basic shapefile use and visualization.
  • Reporting period is parsed, standardized, and aligned to a regular calendar on DHIS2 data preprocessing.

Epidemiological Data Elements

These columns contain epidemiological elements that report on disease burden, diagnosis and treatment, and health system utilization. Data elements on confirmed malaria cases form the basis of incidence calculations and stratification of malaria incidence. Data originate from standardized paper registers that health workers complete during service delivery. The main registers relevant for malaria include: out-patient department register (OPD) that records all outpatient visits; laboratory register that records malaria diagnostic testing; Inpatient register that records malaria hospitalization; Dispensing register that records commodities dispensed; and ANC register that records antenatal visits. The below section outline the main data elements within each register relevant for malaria.

OPD register:

  1. All-Cause Outpatient Visits: total number of all-cause outpatient visits reported

    Show Sierra Leone example

    In Sierra Leone, all-cause outpatient visits are disaggregated by age. Two data elements need to be requested.

    • OPD (New and follow-up curative) 0-59m_X
    • OPD (New and follow-up curative) 5+y_X
  2. Suspected Malaria Cases: reported count of cases with fever symptoms

    Show Sierra Leone example

    In Sierra Leone, suspected cases are disaggregated by age and reporting method (health facility or community health worker).

    • Fever case - suspected Malaria 0-59m_X
    • Fever case - suspected Malaria 5-14y_X
    • Fever case - suspected Malaria 15+y_X
    • Fever case in community (Suspected Malaria) 0-59m_X
    • Fever case in community (Suspected Malaria) 5-14y_X
    • Fever case in community (Suspected Malaria) 15+y_X
  3. Presumed Malaria Cases: the reported number of clinically diagnosed malaria cases (based on symptoms like fever) without confirmatory testing

    Show Sierra Leone example

    In Sierra Leone, presumed cases are not reported, but they may be for the country. Presumed cases can also be calculated if they are not reported, with a few common options for calculations shown on the data preprocessing page.

  4. Confirmed Malaria Cases: The reported number of diagnosed malaria cases based on confirmatory testing

  5. Treated Malaria Cases: count of malaria cases reported as treated

    Show Sierra Leone example

    In Sierra Leone, treated cases are disaggregated by treatment window, age, and treatment administrator (community health worker or health facility). Malaria in pregnant women is also reported separately. The SNT team should be consulted regarding whether treated cases in pregnant women should be added to the number treated for adults, or if they are already included in those counts.

    • Malaria treated in community with ACT <24 hours 0-59m_X
    • Malaria treated in community with ACT >24 hours 0-59m_X
    • Malaria treated in community with ACT <24 hours 5-14y_X
    • Malaria treated in community with ACT >24 hours 5-14y_X
    • Malaria treated in community with ACT <24 hours 15+y_X
    • Malaria treated in community with ACT >24 hours 15+y_X
    • Malaria treated with ACT <24 hours 0-59m_X
    • Malaria treated with ACT >24 hours 0-59m_X
    • Malaria treated with ACT <24 hours 5-14y_X
    • Malaria treated with ACT >24 hours 5-14y_X
    • Malaria treated with ACT <24 hours 15+y_X
    • Malaria treated with ACT >24 hours 15+y_X
    • Malaria in 1st trimester treated
    • Malaria in 2nd or 3rd trimester treated

Lab register:

  1. Malaria Tests Conducted: reported number of malaria tests conducted

    Show Sierra Leone example

    In Sierra Leone, tested cases are not directly reported, but they may be for the country. Sierra Leone reports positive and negative test results separately, which can be summed to calculate total tested cases as shown on the data preprocessing page.

  2. Positive Malaria Tests: reported number of positive malaria tests (confirmed malaria cases)

    Show Sierra Leone example

    In Sierra Leone, positive test results are disaggregated by age, test type, and test administrator (community health worker or health facility).

    • Fever case in community tested for Malaria (RDT) - Positive 0-59m_X
    • Fever case in community tested for Malaria (RDT) - Positive 5-14y_X
    • Fever case in community tested for Malaria (RDT) - Positive 15+y_X
    • Fever case tested for Malaria (Microscopy) - Positive 0-59m_X
    • Fever case tested for Malaria (Microscopy) - Positive 5-14y_X
    • Fever case tested for Malaria (Microscopy) - Positive 15+y_X
    • Fever case tested for Malaria (RDT) - Positive 0-59m_X
    • Fever case tested for Malaria (RDT) - Positive 5-14y_X
    • Fever case tested for Malaria (RDT) - Positive 15+y_X
  3. Negative Malaria Tests: reported number of negative malaria tests

    Show Sierra Leone example

    In Sierra Leone, negative test results are disaggregated by age, test type, and test administrator (community health worker or health facility).

    • Fever case in community tested for Malaria (RDT) - Negative 0-59m_X
    • Fever case in community tested for Malaria (RDT) - Negative 5-14y_X
    • Fever case in community tested for Malaria (RDT) - Negative 15+y_X
    • Fever case tested for Malaria (Microscopy) - Negative 0-59m_X
    • Fever case tested for Malaria (Microscopy) - Negative 5-14y_X
    • Fever case tested for Malaria (Microscopy) - Negative 15+y_X
    • Fever case tested for Malaria (RDT) - Negative 0-59m_X
    • Fever case tested for Malaria (RDT) - Negative 5-14y_X
    • Fever case tested for Malaria (RDT) - Negative 15+y_X

IPD register:

  1. Severe Malaria Cases: reported number of confirmed malaria cases meeting severe malaria criteria

    Show Sierra Leone example

    In Sierra Leone, there is no specific reporting of severe malaria cases.

  2. All-Cause Hospital Admissions: reported total all-cause inpatient admissions

Show Sierra Leone example

In Sierra Leone, all-cause hospital admissions are disaggregated by age, and are either total general admissions (for children under 5) or the sum of admissions from specific conditions (other age groups):

  • All-Cause Admission U5yr: sum of - Admission - Child 1–59 months - Admission - Stabilisation Centre
    • All-Cause Admission 05–14yr: sum of
      • Admission - Child with malaria 5–14 years
      • Admission - Child with diarrhoea
      • Admission - Child with pneumonia
    • All-Cause Admission 15yr+: sum of
      • Admission - Malaria 15+ years
      • Admission - Maternity
      • Admission - Medical
      • Admission - Psychiatric
      • Admission - Surgical
      • Admission - TB
  1. Malaria Hospital Admissions: reported count of patients hospitalized with malaria

    Show Sierra Leone example

    In Sierra Leone, malaria admissions are disaggregated by age.

    • Admission - Child with malaria 0-59 months_X
    • Admission - Child with malaria 5-14 years_X
    • Admission - Malaria 15+ years_X

Death Register:

  1. All-Cause Deaths: reported total of all-cause patient deaths

    Show Sierra Leone example

    In Sierra Leone, all-cause deaths are disaggregated by age, and are the sum of deaths from specific conditions:

    • All-Cause Deaths - U5yr: sum of
      • Child death - Cause unspecified 01-59m
      • Child death - Diarrhoea 01-59m
      • Child death - HIV 01-59m
      • Child death - Malaria 01-59m
      • Child death - Malnutrition 01-59m
      • Child death - Other specified causes 01-59m
      • Child death - Pneumonia 01-59m
      • Child death - Trauma 01-59m
      • Separation - Child 1-59 months Death
    • All-Cause Deaths - 05 - 14yr: sum of
      • Child death - Cause unspecified 05-09y
      • Child death - Cause unspecified 10-14y
      • Child death - Diarrhoea 05-09y
      • Child death - Diarrhoea 10-14y
      • Child death - HIV 05-09y
      • Child death - HIV 10-14y
      • Child death - Malaria 05-09y
      • Child death - Malaria 10-14y
      • Child death - Malnutrition 05-09y
      • Child death - Malnutrition 10-14y
      • Child death - Other specified causes 05-09y
      • Child death - Other specified causes 10-14y
      • Child death - Pneumonia 05-09y
      • Child death - Pneumonia 10-14y
      • Child death - Trauma 05-09y
      • Child death - Trauma 10-14y
      • Separation - Child with malaria 5-14 years Death
      • Separation - Child with diarrhoea Death
      • Separation - Child with pneumonia Death
    • All-Cause Deaths - 15yr: sum of
      • Death malaria 15+ years Female
      • Death malaria 15+ years Male
      • Death other 15+ yrs Male
      • Death other 15+ yrs Female
      • Separation - Medical Death
      • Separation - Surgical Death
  2. Malaria Deaths: reported deaths attributed to malaria

    Show Sierra Leone example

    In Sierra Leone, malaria deaths are disaggregated by age, sex, and inpatient records (separation). Whether only hospital deaths or also community deaths should be included in SNT analysis should be discussed with the SNT team.

    • Deaths in Community
    • Child death - Malaria 1-59m_X
    • Child death - Malaria 10-14y_X
    • Child death - Malaria 5-9y_X
    • Death malaria 15+ years Female
    • Death malaria 15+ years Male
    • Separation - Child with malaria 0-59 months_X Death
    • Separation - Child with malaria 5-14 years_X Death
    • Separation - Malaria 15+ years_X Death
ImportantConsult with SNT team

Sometimes health facilities act as sentinel sites for the collection of deaths that occur at the community. Whether or not this is happening should be confirmed with the NMP surveillance focal point. If facilities are also collecting community death information, there will be many more facilities reporting malaria deaths than reporting inpatients, and the specificity of this variable will be low: there will be many reported malaria deaths without confirmation of malaria as cause of death.

ANC Register:

  1. Total women attended during first visit: Routine malaria testing is conducted for all women at the first ANC visit

  2. Total women tested: Reported number of pregnant women that were tested for malaria

  3. Total women that tested positive: Reported number of pregnant women that tested positive

Other relevant data elements:

  1. Anemia outpatients, admissions and deaths: reported outpatients, hospital admissions, and deaths associated with anemia. While data quality may not be high, anemia is an important burden outcome for malaria and should be tracked and analyzed when possible.

    Show Sierra Leone example

    In Sierra Leone, there is no specific reporting of anemia.

TipWhere this data is used

The epidemiological elements flow through the full routine-data preprocessing pipeline before being consumed by the stratification and trend analyses:

  • Preprocess and quality-control the routine data: DHIS2 data preprocessing, determining active and inactive status, health facility reporting rate, missing data detection, outlier detection and outlier correction, imputation, data coherency checks, and assembly of the final analysis-ready database.
  • Stratify and analyze trends: Crude incidence, incidence stratification, and trend analysis.

Intervention Data Elements

Both malaria-specific and non-malaria interventions from routine data can be informative for SNT. Routine intervention data is also available in DHIS2. Note that campaign data such as for SMC or mass ITN campaigns is usually collected in a separate system and will need to be accessed and managed separately for SNT, unless countries already have an NMDR in place where all information can be accessible.

Routine intervention data can be extracted, managed, and analyzed by adapting the code provided in this library. Some routine intervention data that may be relevant for the SNT include:

Malaria Interventions

  1. Routine ITN Distribution: distribution of insecticide-treated nets through health facilities and outreach programs. These routine distributions could happen at antenatal care (ANC) visits, with vaccination in infancy through the Expanded Programme on Immunization (EPI), through school-based distribution, or other mechanisms. We will need to know which distribution systems are relevant to the SNT context, and which ones report into DHIS2, to ensure that all the necessary data elements are extracted to appropriately calculate population and operational coverages and other relevant indicators.

    Show Sierra Leone example

    In Sierra Leone, ITN distribution is disaggregated by setting (facility or outreach campaign) and age. For example, ITNs given with the 3rd dose of immunization with the pentavalent vaccine, and ITNs given at ANC visit, are reported in these columns:

    • LLITN given at Pentavalent 3rd dose In_Facility, 0-11m
    • LLITN given at Pentavalent 3rd dose In_Facility, 12-59m
    • LLITN given at Pentavalent 3rd dose Outreach, 0-11m
    • LLITN given at Pentavalent 3rd dose Outreach, 12-59m
    • Antenatal client given LLITN In_Facility
    • Antenatal client given LLITN Outreach
  2. Intermittent Preventive Treatment in Pregnancy (IPTp): reported number of pregnant women receiving doses of sulfadoxine-pyrimethamine for malaria prevention

    Show Sierra Leone example

    In Sierra Leone, IPTp treatments are disaggregated by visit number and setting (facility, community, or outreach). Up to three doses are recorded. Note that the number of nationally recommended doses of IPTp can vary by country.

    • Antenatal client IPTp 1st dose in community
    • Antenatal client IPTp 2nd dose in community
    • Antenatal client IPTp 3rd dose in community
    • Antenatal client IPT 1st dose In_Facility
    • Antenatal client IPT 1st dose Outreach
    • Antenatal client IPT 2nd dose In_Facility
    • Antenatal client IPT 2nd dose Outreach
    • Antenatal client IPT 3rd dose In_Facility
    • Antenatal client IPT 3rd dose Outreach
  3. Perennial Malaria Chemoprevention (PMC): preventive sulfadoxine-pyrimethamine doses given periodically to children in the first one or two years of life.

    Show Sierra Leone example

    In Sierra Leone, PMC is called IPTi, and the schedule includes 3 doses during routine vaccination visits. Routine IPTi doses are disaggregated by dose, setting, and infant age.

    • IPTi 1st dose given In_Facility, 0-11m
    • IPTi 1st dose given In_Facility, 12-59m
    • IPTi 1st dose given Outreach, 0-11m
    • IPTi 1st dose given Outreach, 12-59m
    • IPTi 2nd dose given In_Facility, 0-11m
    • IPTi 2nd dose given In_Facility, 12-59m
    • IPTi 2nd dose given Outreach, 0-11m
    • IPTi 2nd dose given Outreach, 12-59m
    • IPTi 3rd dose given In_Facility, 0-11m
    • IPTi 3rd dose given In_Facility, 12-59m
    • IPTi 3rd dose given Outreach, 0-11m
    • IPTi 3rd dose given Outreach, 12-59m
  4. Malaria Vaccine: administration of malaria vaccines (RTS,S/AS01 or R21). The schedule for both vaccines includes a 3-dose priming series, which may be followed by one or more boosters.

    Show Sierra Leone example

    In Sierra Leone, malaria vaccine doses are disaggregated by age, setting, and dose.

    • Malaria 1st dose In_Facility, 0-11m
    • Malaria 1st dose In_Facility, 12-59m
    • Malaria 1st dose Outreach, 0-11m
    • Malaria 1st dose Outreach, 12-59m
    • Malaria 2nd dose In_Facility, 0-11m
    • Malaria 2nd dose In_Facility, 12-59m
    • Malaria 2nd dose Outreach, 0-11m
    • Malaria 2nd dose Outreach, 12-59m
    • Malaria 3rd dose In_Facility, 0-11m
    • Malaria 3rd dose In_Facility, 12-59m
    • Malaria 3rd dose Outreach, 0-11m
    • Malaria 3rd dose Outreach, 12-59m
    • Malaria 4th dose In_Facility, 0-11m
    • Malaria 4th dose In_Facility, 12-59m
    • Malaria 4th dose Outreach, 0-11m
    • Malaria 4th dose Outreach, 12-59m

Data Elements to Inform Coverage of Routine Interventions

  1. ANC (Antenatal Care) Visits: routine antenatal care attendance tracking for maternal health monitoring. ANC visits are important to understand for several aspects, including:

    1. understanding operational coverage of IPTp,
    2. understanding access to care, through the lens of pregnant women, including the timing of first and follow-up visits and dropout rates through time, and
    3. if pregnant women at ANC, or a specific ANC visit, are universally tested for malaria, the test positivity rate (TPR) can be used to monitor trends in malaria transmission.

    Show Sierra Leone example

    In Sierra Leone, antenatal care is disaggregated by visit number, trimester, and setting, up to 8 visits.

    • Antenatal client 1st visit In_Facility
    • Antenatal client 1st visit Outreach
    • Antenatal client 1st visit under 12 weeks In_Facility
    • Antenatal client 1st visit under 12 weeks Outreach
    • Antenatal client 4th visit In_Facility
    • Antenatal client 4th visit Outreach
    • Antenatal client 8th visit In_Facility
    • Antenatal client 8th visit Outreach
ImportantConsult with SNT team

It is important to understand certain data practices around ANC:

  • Is there any information on the average gestational period of the women who attend ANC1? This is to understand which ANC visit should be used to calculate the coverage of IPTp1, 2, and 3. If ANC1 is attended by women before week 12, which is not common in many parts of Africa but still possible, then these women will not be eligible for IPTp until they reach an ANC visit when they are in the 2nd trimester.

  • Is the number of ANC visit associated to the timing of the pregnancy, or does it follow each woman through time regardless of gestational age? For example, for a woman who goes to her first ANC visit in the 3rd trimester, will her visit be counted as ANC1 or as ANC-X associated to her gestational age?

  1. Routine childhood immunizations other than malaria: vaccination data (e.g. pentavalent vaccine, measles, polio, etc.) that counts the number of children vaccinated at the same time as when an ITN, PMC or the malaria vaccine are, should be, or could be delivered. This information helps measure operational coverage for different interventions and also helps assess access and strength of the immunization system to the target community population.

    Show Sierra Leone example

    In Sierra Leone, other routine immunizations include pentavalent vaccines.

    • Pentavalent 1st dose In_Facility, 0-11m
    • Pentavalent 1st dose In_Facility, 12-59m
    • Pentavalent 1st dose Outreach, 0-11m
    • Pentavalent 1st dose Outreach, 12-59m
    • Pentavalent 2nd dose In_Facility, 0-11m
    • Pentavalent 2nd dose In_Facility, 12-59m
    • Pentavalent 2nd dose Outreach, 0-11m
    • Pentavalent 2nd dose Outreach, 12-59m
    • Pentavalent 3rd dose In_Facility, 0-11m
    • Pentavalent 3rd dose In_Facility, 12-59m
    • Pentavalent 3rd dose Outreach, 0-11m
    • Pentavalent 3rd dose Outreach, 12-59m
  2. Eligible populations: number of children eligible for each vaccination touchpoint, or expected number of pregnant women. These population denominators are important for understanding effective coverage of vaccinations, ANC, and IPTp.

    Show Sierra Leone example

    In Sierra Leone, eligible EPI populations are reported in DHIS2. While the expected number of pregnant women is also reported, this variable is not regularly updated, and therefore in practice the NMP instead uses estimates number of pregnant women as 4.4% of the total population.

    • EPI_12-23 months
    • EPI_12-59 months
    • EPI_9 months - 14 yrs
    • EPI_HPV target
    • EPI_Live births
    • EPI_Non Pregnant Women
    • EPI_Population total
    • EPI_Pregnant Women
    • EPI_Surviving Infants
    • EPI_Under 15 years
    • EPI_Under 5 years
    • EPI_Women of child bearing age
TipWhere this data is used

The intervention data elements are the inputs for the situation-analysis chapter on past interventions:

  • Routine interventions: coverage of ITN, IPTp, PMC, and the malaria vaccine through routine delivery channels.
  • ITN campaigns: campaign-based ITN distribution analysis.
  • Other interventions: SMC, IRS, and other vector-control interventions whose routine reporting (where available) is catalogued above.
  • Case management quality: uncomplicated and severe malaria case-management performance, which uses the treatment-by-age and severity elements above.

Stock Data Elements

Stock data tracks the availability of essential malaria commodities.

In SNT, stock information can be used to identify facilities to target for performance improvement, interpret epidemiological data such as when testing rate or treatment rate is unexpectedly low, and inform analysis of risk factors associated with malaria burden, among other uses. Therefore, during SNT we should extract and review all available stock data.

If there is data on absolute stock numbers, these can be compared to the number of treatments given and/or tests conducted for the same period of time, to evaluate coherency across datasets and potentially explain certain incoherencies.

If there are stockouts, it is important to understand the definition of stockouts and the ways that they are reported. Ask if there is any information that will allow understanding why the stockouts took place: for example, there may be archived reports or bulletins on stockout events.

Stock information is reported through the Logistics Management Information System (LMIS), but health facilities may also report on the number of days each month for which they have stockouts and the number of months for which stock is available into the HMIS. Consult the SNT team to determine where stockout indicators can be sourced for analysis.

Show Sierra Leone example

In Sierra Leone, stockout data for antimalarials are reported by type, and for ACTs, also by dosages (for example, pediatric and adult). This example only shows stockout data for RDTs and antimalarials, but during SNT it is also advised to review stockouts for other routine commodities (ITNs, malaria vaccine, etc.) and absolute available stock if possible.

  • Malaria Rapid Diagnostic Test Kit - Stockout
  • Artemether 20mg/ml, Inj - Stockout
  • Artesunate 50mg, Suppository - Stockout
  • Artesunate 60mg/ml, Inj, Vial - Stockout
  • Sulphadoxine & Pyrimethamine 500mg & 25mg, Tab - Stockout
  • Artemether & Lumefantrine (ACT) 20mg & 120mg, 6 Tabs - Stockout
  • Artemether & Lumefantrine (ACT) 20mg & 120mg, 12 Tabs - Stockout
  • Artemether & Lumefantrine (ACT) 20mg & 120mg, 18 Tabs - Stockout
  • Artemether & Lumefantrine (ACT) 20mg & 120mg, 24 Tabs - Stockout
  • Artesunate 20mg/ml, Inj - Stockout
TipWhere this data is used

Stock data informs both stand-alone supply-chain analyses and the case-management quality work:

  • LMIS data: preparing and structuring the stock and logistics data for analysis.
  • Stockouts: analyzing stockout patterns and their impact on case-management performance.

Quality Checks on the Country’s Submission

The catalogue above describes what to ask for. Once an extraction is delivered by the data manager, the next gate is to confirm that what came back matches what was requested and is fit for analysis. Running a structured set of checks before any data is imported saves significant rework later: corrupted files, missing months, renamed indicators, and admin units that do not match the shapefile all compound through every downstream step if they are not caught early.

The checks below are grouped into five areas, each presented in the same what it looks like / why it matters / how to detect / how to resolve format used in the Common issues with point coordinate data section. For every flag, the goal is to log the issue for the SNT team rather than silently fix it, because most resolutions are conversations with the data manager rather than automated transformations, and a clean record of changes is necessary for the SNT methods notes.

TipWhen to run these checks

Run the checks immediately after the extraction is received, before any preprocessing or merging. Code-runnable implementations of several of the checks below appear on the Routine data preprocessing page; this page focuses on what to look for and why.

Coverage of Years

Routine surveillance is most useful when the time series is complete and consistent. Gaps in years, in months within a year, or in reporting frequency all change the interpretation of any trend the SNT team draws from the data.

All required years present

  • What it looks like: the indicators requested for 2020–2025 only contain 2021–2024.
  • Why it matters: missing years break any longitudinal trend and force either a re-extraction or a restricted analysis window.
  • How to detect: list the unique years in each indicator’s reporting-period column and compare against the requested range.
  • How to resolve: confirm with the data manager whether the gap is a true reporting gap or an extraction oversight. If the years exist in DHIS2, request a re-extraction.

All months or quarters within each year

  • What it looks like: 2023 only contains January through August.
  • Why it matters: partial years bias any annual aggregate, especially for malaria where peak transmission sits late in the year and a truncated series will systematically under-count cases.
  • How to detect: for each year, count the distinct reporting periods and compare against the expected 12 months or 4 quarters.
  • How to resolve: for the most recent year, partial coverage is often explained by reporting lag and should be documented as such. For older years, request the missing periods.

Reporting frequency consistent across years

  • What it looks like: 2019–2021 are monthly and 2022–2024 are quarterly.
  • Why it matters: mixed frequencies cannot be analyzed together without re-aggregating to the lowest common denominator, which discards information.
  • How to detect: check the distinct reporting-period formats across the time series.
  • How to resolve: request the data at a single frequency (typically monthly) for the entire window. If that is not possible, document the change point and adjust analyses accordingly.

Partial-year coverage flagged

  • What it looks like: the dataset ends in August of the latest year because reporting for September onward is incomplete.
  • Why it matters: trailing partial years are easy to miss when totals are computed, and produce a misleading drop in the final year.
  • How to detect: confirm that the last reporting period for each indicator matches the requested end date.
  • How to resolve: keep the partial year if recency is needed, but document it explicitly and exclude it from annual comparisons.

Coverage of Indicators

The catalogue earlier on this page lists the indicators an SNT analysis depends on. Once the extraction is in hand, the question is whether all of them are present, named consistently, and defined the same way across years and across reporting units.

All requested indicators present

  • What it looks like: the team requested 17 epidemiological elements; the extraction contains 14.
  • Why it matters: missing indicators force either a re-extraction or analytical workarounds (e.g. presumed cases computed from suspected minus tested).
  • How to detect: compare the column list of the extraction against the requested indicator list.
  • How to resolve: if the indicator exists in DHIS2 under a different name, document the rename. Otherwise request a re-extraction.

No duplicate indicators under different names

  • What it looks like: two columns that appear to count the same thing, for example Confirmed malaria 0-59m and Malaria confirmed cases 0-59 months.
  • Why it matters: double-counting inflates incidence and treatment rates, and using only one of the two silently drops cases reported under the other name.
  • How to detect: group the indicator names by catalogue category and look for near-duplicates within a group.
  • How to resolve: confirm with the data manager which is current. If both are in active use, understand the reason (parallel reporting streams, programmatic vs. research) before deciding which to keep.

Indicator renames documented

  • What it looks like: Suspected Malaria 5+y becomes Suspected Malaria 5-14y plus Suspected Malaria 15+y from 2022 onward.
  • Why it matters: an unflagged rename produces a series that looks discontinuous when it is actually a definition change.
  • How to detect: check whether any indicator’s first non-zero year is later than the start of the time series, and whether any indicator’s last non-zero year is earlier than the end.
  • How to resolve: request the historical mapping from the data manager and document the change in the SNT methods notes.

Definitions consistent across years

  • What it looks like: the case definition for confirmed malaria changes from “any positive test” to “RDT-only” in 2022.
  • Why it matters: a definition change is invisible in the data but produces an artifactual step in the time series that is easily misread as an epidemiological signal.
  • How to detect: ask the data manager whether case definitions, drug regimens, or denominators changed during the period.
  • How to resolve: document the change point and decide whether to harmonize (restrict to one definition for all years) or split the analysis at the break point.

New indicators clearly documented

  • What it looks like: IPTi 1st dose appears for the first time in 2024 because PMC was rolled out that year.
  • Why it matters: a new indicator is not a data problem, but it must be flagged so that zero-valued earlier years are not interpreted as zero coverage.
  • How to detect: find indicators whose first non-zero year is after the start of the requested window.
  • How to resolve: mark the pre-introduction years as NA rather than zero in the cleaned dataset, and document the rollout year.

Discontinued indicators flagged

  • What it looks like: Presumed Malaria was reported through 2020 and discontinued when universal testing was rolled out.
  • Why it matters: the same risk as a new indicator, in reverse: post-discontinuation zeros are not real zeros.
  • How to detect: find indicators whose last non-zero year is before the end of the requested window.
  • How to resolve: mark the post-discontinuation years as NA and document the reason.

Aggregation symmetric across age and setting

  • What it looks like: conf_u5 and conf_ov5 exist for confirmed cases, but treatment is reported as a single all-ages maltreat column.
  • Why it matters: asymmetric disaggregation prevents age-stratified analyses for indicators that should be comparable, for example a treatment-to-case ratio by age group.
  • How to detect: for each indicator, list the disaggregation categories (age, setting, dose) and check that they are consistent across related indicators.
  • How to resolve: request the missing disaggregations if they exist in DHIS2. If not, document which analyses are constrained to the all-ages level.

Dataset Structure

Even when the indicators are correct, the shape of the file can block analysis. The checks below catch the structural problems that turn a five-minute load into a half-day cleanup.

Tabular structure with rows as reporting units

  • What it looks like: each row is one health facility for one reporting period, and each column is an indicator or a metadata field.
  • Why it matters: any other structure (rows as indicators, multiple header rows, repeated key columns) requires reshaping before any analysis can begin.
  • How to detect: open the first sheet and confirm the first row is a single header.
  • How to resolve: request a flat tabular extract. If that is not possible, reshape on import and document the transformation.

Consistent structure across tabs and files

  • What it looks like: an Excel workbook with one tab per year, each using slightly different column names or column orders.
  • Why it matters: tab-by-tab differences silently break any rbind or pd.concat and produce wrong totals when columns are misaligned.
  • How to detect: compare the column names and column order across all tabs or files.
  • How to resolve: align column names and order before concatenation. Ideally, request a single multi-year file from the data manager.

Wide or long format clearly identified

  • What it looks like: wide format has one column per indicator (e.g. conf_u5, conf_ov5); long format has an indicator column and a value column with one row per indicator-value pair.
  • Why it matters: both formats are workable, but conflating them produces broken joins and silent NA explosions.
  • How to detect: check whether indicators are columns (wide) or values within a column (long).
  • How to resolve: pick one format for the cleaned dataset (long is usually easier for plotting, wide is usually easier for indicator-by-indicator QA) and pivot once on import.

No merged cells, subtotals, or formatting artifacts

  • What it looks like: an Excel sheet with a merged “Region: Eastern” cell spanning four district rows, plus a subtotal row at the bottom of each region.
  • Why it matters: merged cells produce NA in all but the top-left cell when read by R or Python, and subtotal rows double-count when summed.
  • How to detect: open the file in Excel and scan for merged cells, blank header rows, and bold-formatted total rows.
  • How to resolve: request a flat un-merged extract. If that is not possible, drop subtotal rows and forward-fill merged keys on import.

Geographic levels labeled clearly

  • What it looks like: the file has columns named adm0, adm1, adm2, and adm3 for each reporting row.
  • Why it matters: without explicit admin labels, we cannot tell whether a row is a facility, a chiefdom, a district, or a region, and any spatial join will fail.
  • How to detect: confirm that every reporting row carries the full admin hierarchy.
  • How to resolve: request the full admin hierarchy as separate columns. If only one admin level is present, ask the SNT team for a crosswalk file.

Year-separated or concatenated files documented

  • What it looks like: one file per year named dhis2_2020.xlsx, dhis2_2021.xlsx, and so on.
  • Why it matters: year-separated files are workable but require concatenation, and any column-structure drift across years surfaces only at the concatenation step.
  • How to detect: confirm the file naming convention and whether the years are split or concatenated.
  • How to resolve: concatenate on import after confirming the column structure is identical. Document the source files in the SNT methods notes.

Integrity of Data

Before any of the previous checks can run, the file has to open cleanly. The integrity checks below are quick, but skipping them means losing time later to silent encoding errors, locked sheets, and partial reads.

File opens without errors

  • What it looks like: read_excel() or read_csv() returns the expected number of rows and no warnings.
  • Why it matters: a corrupted file may open partially and silently truncate, producing a smaller-than-expected dataset that looks normal.
  • How to detect: confirm the row count read matches the row count visible in Excel.
  • How to resolve: request a re-export from the data manager. Do not attempt to recover a corrupted file by hand.

Workable file format

  • What it looks like: the extraction is delivered as .xlsx, .csv, or .dta, not as a scanned PDF or a screenshot.
  • Why it matters: non-tabular formats require OCR or manual re-entry, both of which introduce errors and are not reproducible.
  • How to detect: check the file extension and open the file.
  • How to resolve: request a CSV or XLSX from the source system. Do not work from PDFs unless there is no alternative, and document any OCR step.

No password protection or locked cells

  • What it looks like: Excel prompts for a password on open, or specific cells refuse edits after the file is opened.
  • Why it matters: password-protected files block scripted ingestion, and locked cells can silently propagate as NA through some readers.
  • How to detect: open the file in Excel and check for password prompts or grey locked cells.
  • How to resolve: request an unprotected version from the data manager.

Correct character encoding

  • What it looks like: facility names and admin names with accented characters display correctly (for example Bafatá, not Bafat? or Bafatá).
  • Why it matters: encoding errors break every downstream name match and crosswalk, and propagate silently into the SNT report. This is a common issue in bilingual contexts where French accents and special characters are widespread.
  • How to detect: scan the facility-name and admin-name columns for replacement characters or mojibake (garbled text like é where é should appear).
  • How to resolve: re-read with the correct encoding (UTF-8, Latin-1, or Windows-1252). If the source file is itself corrupted, request a re-export.

Harmonization with Shapefiles and the MFL

The routine extraction only becomes useful when it can be linked to administrative boundaries (for mapping and aggregation) and to the master facility list (for facility-level analyses). The checks below catch the harmonization issues that block these joins; the workflows that resolve them are documented on the Master facility lists, Facility coordinates, and Shapefile management pages.

Admin levels align with the shapefile

  • What it looks like: the routine data reports at adm3 (chiefdom), and the shapefile also has an adm3 layer with matching unit counts.
  • Why it matters: if the admin level in the data does not exist in the shapefile, no spatial join is possible and aggregated maps cannot be produced.
  • How to detect: compare the number of distinct admin units at each level in the routine data against the shapefile.
  • How to resolve: request a shapefile at the matching admin level, or aggregate the routine data up to an admin level the shapefile covers.

MFL available as a crosswalk

  • What it looks like: an MFL is available that lists every health facility with its admin hierarchy and coordinates.
  • Why it matters: the MFL is the bridge between the routine data and the shapefile. Without it, facility-level analyses cannot be mapped.
  • How to detect: confirm that the MFL exists and covers the reporting period of the routine data.
  • How to resolve: request the latest MFL from the SNT team. If it is outdated, document which facilities are missing.

MFL current for the reporting period

  • What it looks like: the MFL was last updated in 2024 and covers all facilities reporting in 2020–2025.
  • Why it matters: an outdated MFL will not contain newly opened facilities, which then go unmapped. It may also still contain closed facilities that no longer report.
  • How to detect: compare the MFL facility list against the unique facilities in the routine data.
  • How to resolve: request an updated MFL from the SNT team. For facilities in the routine data but not the MFL, use fuzzy matching (see Fuzzy matching of names across datasets) and flag every match for review.

All facility names matched

  • What it looks like: every facility name in the routine data has a corresponding entry in the MFL.
  • Why it matters: unmatched facilities cannot be assigned to an admin unit or linked to coordinates, and they silently drop out of any spatial aggregate.
  • How to detect: join the routine data to the MFL on facility name and count the unmatched rows.
  • How to resolve: use fuzzy matching to resolve typographic differences. Track and review every unmatched facility with the SNT team rather than dropping it silently.

Admin unit names consistent across datasets

  • What it looks like: the district Western Area Urban is spelled the same way in the routine data, the MFL, and the shapefile.
  • Why it matters: spelling, casing, or punctuation differences (Western Area - Urban vs. Western Area Urban) break every join.
  • How to detect: compare the unique admin-unit name lists across the three sources.
  • How to resolve: normalize names (lowercase, trim whitespace, standardize punctuation) before joining. For differences that survive normalization, build a manual crosswalk and store it alongside the data.

Coordinates valid and inside admin boundaries

  • What it looks like: every facility’s longitude and latitude fall inside the country’s adm0 boundary and inside the admin unit they are assigned to.
  • Why it matters: out-of-boundary coordinates indicate flipped longitude and latitude, low precision, or data entry errors. If used unflagged, they distort coverage analyses.
  • How to detect: run the coordinate quality checks documented on the Facility coordinates page.
  • How to resolve: flag invalid coordinates for the SNT team rather than dropping them. The resolution is typically a conversation with the data manager.
ImportantWhat to send back to the SNT team if a check fails

When any of the checks above flag a problem, send the SNT team a short note containing:

  • The dataset and reporting period affected
  • The specific check that failed and a one-line description of the symptom
  • The number of rows, facilities, or indicators concerned
  • A proposed resolution (re-extract, document and proceed, or exclude from analysis)

A consistent format makes the back-and-forth with the data manager faster, and ensures every flag is tracked and resolved before the analysis begins.

 

©2026 Applied Health Analytics for Delivery and Innovation. All rights reserved