plot_relationship.py

assign_age_group(age_range, categories)

Assign an age group based on a given age range and predefined category bounds. This function checks which predefined age category a given age range falls into by comparing the minimum and maximum age values with the bounds of each category. If the range overlaps or fits entirely within a category, the corresponding category is returned. If no match is found, the function returns None.

Parameters:
  • age_range (tuple) –

    A tuple containing the minimum and maximum age of the range (e.g., (0.5, 1)).

  • categories (dict) –

    A dictionary of age categories, where each key is a category name, and the value is another dictionary with ‘min’ and ‘max’ keys defining the age bounds of the category.

Returns:
  • str or None: The name of the category the age range fits into, or None if no match is found.

Source code in plotter\plot_helper.py
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
def assign_age_group(age_range, categories):
    """
    Assign an age group based on a given age range and predefined category bounds.
    This function checks which predefined age category a given age range falls into by comparing
    the minimum and maximum age values with the bounds of each category. If the range overlaps
    or fits entirely within a category, the corresponding category is returned. If no match is found,
    the function returns None.

    Args:
        age_range (tuple): A tuple containing the minimum and maximum age of the range (e.g., (0.5, 1)).
        categories (dict): A dictionary of age categories, where each key is a category name, and
                           the value is another dictionary with 'min' and 'max' keys defining the
                           age bounds of the category.

    Returns:
        str or None: The name of the category the age range fits into, or None if no match is found.
    """
    min_age, max_age = age_range
    for category, bounds in categories.items():
        if float(min_age) >= bounds["min"] and float(max_age) <= bounds["max"]:
            return category
    # Handle overlapping or inclusive ranges
    for category, bounds in categories.items():
        if float(min_age) < bounds["max"] and float(max_age) > bounds["min"]:
            return category
    return None  # If no category fits

clean_fname(fname, sweepvar=None, unique_groups=None, facet_var=None, unique_facets=None)

Clean and modify a given filename by replacing placeholder variables with actual values. This function replaces placeholder values in the provided filename based on the provided parameters. Specifically, it replaces instances of the sweep variable and facet variable with values from the unique groups and facets, if applicable, and changes ‘modelname’ to ‘model’.

Parameters:
  • fname (str) –

    The filename to be cleaned and modified.

  • sweepvar (str, default: None ) –

    The name of the sweep variable in the filename. Defaults to None.

  • unique_groups (list, default: None ) –

    A list of unique group names (e.g., model names) to replace the sweep variable placeholder. Defaults to None.

  • facet_var (str, default: None ) –

    The name of the facet variable in the filename. Defaults to None.

  • unique_facets (list, default: None ) –

    A list of unique facet names to replace the facet variable placeholder. Defaults to None.

Returns:
  • str

    The cleaned and modified filename.

Source code in plotter\plot_helper.py
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
def clean_fname(fname, sweepvar= None,unique_groups=None, facet_var = None, unique_facets= None):
    """
    Clean and modify a given filename by replacing placeholder variables with actual values.
    This function replaces placeholder values in the provided filename based on the provided
    parameters. Specifically, it replaces instances of the sweep variable and facet variable
    with values from the unique groups and facets, if applicable, and changes 'modelname' to 'model'.

    Args:
        fname (str): The filename to be cleaned and modified.
        sweepvar (str, optional): The name of the sweep variable in the filename. Defaults to None.
        unique_groups (list, optional): A list of unique group names (e.g., model names) to replace
            the sweep variable placeholder. Defaults to None.
        facet_var (str, optional): The name of the facet variable in the filename. Defaults to None.
        unique_facets (list, optional): A list of unique facet names to replace the facet variable
            placeholder. Defaults to None.

    Returns:
        str: The cleaned and modified filename.
    """
    if unique_facets is not None  :
        if len(unique_facets) == 1 and not isinstance(unique_facets[0], int):
            fname = fname.replace(facet_var, str(unique_facets[0]))
    if sweepvar is not None :
        if len(unique_groups) == 1 and not isinstance(unique_groups[0], int):
            fname = fname.replace(sweepvar, str(unique_groups[0]))
    fname = fname.replace('modelname', 'model')
    return fname

color_selector(i, s)

Select a color index based on the model name.

This function returns a color index based on the specified model name. If the model name is recognized, a predefined index is returned; otherwise, the input index is returned.

Parameters:
  • i (int) –

    The default index to return if the model name is not recognized.

  • s (str) –

    The name of the model. Possible values include: - ‘EMOD’ - ‘malariasimulation’ - ‘OpenMalaria’

Returns:
  • int

    The color index corresponding to the model name.

Source code in plotter\plot_helper.py
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
def color_selector(i, s):
    """
    Select a color index based on the model name.

    This function returns a color index based on the specified model name.
    If the model name is recognized, a predefined index is returned;
    otherwise, the input index is returned.

    Args:
        i (int): The default index to return if the model name is not recognized.
        s (str): The name of the model. Possible values include:
            - 'EMOD'
            - 'malariasimulation'
            - 'OpenMalaria'

    Returns:
        int: The color index corresponding to the model name.
    """

    if s == 'EMOD':
        return 0
    elif s == 'malariasimulation':
        return 1
    elif s == 'OpenMalaria':
        return 2
    else:
        return i

convert_to_date(x)

Convert a number of days since January 1, 2005, to a date.

This function takes an integer representing the number of days since January 1, 2005, and returns the corresponding date.

Parameters:
  • x (int) –

    The number of days since January 1, 2005.

Returns:
  • date

    A datetime.date object representing the corresponding date.

Source code in plotter\plot_helper.py
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
def convert_to_date(x):
    """
    Convert a number of days since January 1, 2005, to a date.

    This function takes an integer representing the number of days
    since January 1, 2005, and returns the corresponding date.

    Args:
        x (int): The number of days since January 1, 2005.

    Returns:
        date: A datetime.date object representing the corresponding date.
    """

    import datetime
    return datetime.date(2005, 1, 1) + datetime.timedelta(days=x)

custom_sort_key(age_group)

Custom sort key function for sorting age groups.

This function extracts the lower bound of an age group represented as a string in the format ‘X-Y’ and returns it as an integer. It is primarily used for sorting age groups in ascending order based on their lower bounds.

Parameters:
  • age_group (str) –

    The age group string in the format ‘X-Y’, where X is the lower bound and Y is the upper bound.

Returns:
  • int

    The lower bound of the age group as an integer.

Source code in plotter\plot_helper.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
def custom_sort_key(age_group):
    """
    Custom sort key function for sorting age groups.

    This function extracts the lower bound of an age group represented as
    a string in the format 'X-Y' and returns it as an integer. It is
    primarily used for sorting age groups in ascending order based on
    their lower bounds.

    Args:
        age_group (str): The age group string in the format 'X-Y',
                         where X is the lower bound and Y is the upper bound.

    Returns:
        int: The lower bound of the age group as an integer.
    """

    return int(age_group.split('-')[0])

eir_to_outcome(fdir, df, sweepvar='cm_clinical', facet_var='seasonality', eir_val='simulatedEIR', channel='prevalence_2to10', agegrp='0-5')

Generate line plots for EIR (Entomological Inoculation Rate) and a requested outcome variable, with models represented as colors and sweep variables as panels.

This function creates line plots where the x-axis represents the EIR and the y-axis represents an outcome variable, with different models indicated by color and organized into panels based on specified facets.

Parameters:
  • fdir (str) –

    Directory where the generated plot will be saved.

  • df (DataFrame) –

    DataFrame that includes combined model results.

  • sweepvar (str, default: 'cm_clinical' ) –

    Variable to group the data and create multiple panels on the plot. Default is ‘cm_clinical’.

  • eir_val (str, default: 'simulatedEIR' ) –

    The EIR (Entomological Inoculation Rate) value to be used for plotting. Default is ‘simulatedEIR’.

  • channel (str, default: 'prevalence_2to10' ) –

    The outcome variable to compare to EIR. Default is ‘prevalence_2to10’.

  • agegrp (str, default: '0-5' ) –

    Limits the resulting graphs to the selected age group. Default is ‘0-5’.

Returns:
  • None

    The function saves the generated plots to disk.

Source code in plotter\plot_relationship.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
def eir_to_outcome(fdir, df, sweepvar='cm_clinical', facet_var='seasonality', eir_val='simulatedEIR',
                   channel='prevalence_2to10', agegrp='0-5'):
    """
    Generate line plots for EIR (Entomological Inoculation Rate) and a requested
    outcome variable, with models represented as colors and sweep variables as panels.

    This function creates line plots where the x-axis represents the EIR and the
    y-axis represents an outcome variable, with different models indicated by color
    and organized into panels based on specified facets.

    Args:
        fdir (str): Directory where the generated plot will be saved.
        df (pd.DataFrame): DataFrame that includes combined model results.
        sweepvar (str, optional): Variable to group the data and create multiple
            panels on the plot. Default is 'cm_clinical'.
        eir_val (str, optional): The EIR (Entomological Inoculation Rate) value to
            be used for plotting. Default is 'simulatedEIR'.
        channel (str, optional): The outcome variable to compare to EIR. Default is
            'prevalence_2to10'.
        agegrp (str, optional): Limits the resulting graphs to the selected age group.
            Default is '0-5'.

    Returns:
        None: The function saves the generated plots to disk.
    """

    figure_vars = [eir_val, channel] + [sweepvar, facet_var]
    df, caption_txt = subset_dataframe_for_plot(df, figure_vars, agegrp)
    color_palette = sns.color_palette('colorblind', max(len(df[sweepvar].unique()), 4))

    firstPlot = True
    unique_facets = sorted_list(df[facet_var])
    unique_groups = sorted_list(df[sweepvar])

    nx = max(1,len(unique_groups))
    ny = max(1,len(unique_facets))
    f = 1
    fig = plt.figure(figsize=(10 * nx, 10 * ny))

    for fi in unique_facets:
        ax = fig.add_subplot(ny, nx, f)
        ax.set_title(fi)
        if len(unique_facets) == 1:
            ax.set_title('')

        f += 1
        fdf = df[(df[facet_var] == fi)]

        for i, (s, sdf) in enumerate(fdf.groupby([sweepvar])):
            s = s[0]
            color_key = color_selector(i, s)
            xmean, ymean = get_x_y(sdf, 'target_output_values', eir_val, channel)
            merge_df = pd.merge(left=xmean, right=ymean, on='target_output_values')
            merge_df.sort_values(by=eir_val, inplace=True)
            ax.plot(merge_df[eir_val], merge_df[channel], '-', linewidth=0.8, label=f"{s}",
                    color=color_palette[color_key])
            ax.fill_between(merge_df[eir_val], merge_df[f'{channel}_min'], merge_df[f'{channel}_max'], alpha=0.1,
                            color=color_palette[color_key])

        if firstPlot:
            lg = ax.legend(loc='upper left', bbox_to_anchor=(0, 1))
            firstPlot = False

        ax.set_xlim(0.1, 1000)
        ax.set_xscale('symlog')
        ax.set_xlabel(f'{eir_val.replace("EIR", " annual EIR")}', fontsize=14)
        ax.set_ylabel(get_label(channel))

    fname = f'{eir_val}_{channel}_{agegrp}_{sweepvar}_{facet_var}'
    fname = clean_fname(fname, sweepvar, unique_groups, facet_var, unique_facets)
    fig.savefig(os.path.join(fdir, f'{fname}.png'), bbox_extra_artists=(lg,), bbox_inches='tight')
    plt.close()

get_label(channel)

Retrieve the label for a given outcome. This function returns a formatted string representing the y-axis label based on the specified channel name. The labels correspond to specific epidemiological measures. If the channel is not recognized, the function simply returns the input channel name as-is.

Parameters:
  • channel (str) –

    The name of the channel for which to retrieve the label. Possible values include (but are not limited to): - ‘prevalence_2to10’: Represents $\it{Pf}$PR$_{2-10}$ (%) prevalence. - ‘prevalence’: Represents $\it{Pf}$PR (%) prevalence. - ‘clinical_incidence’: Represents clinical incidence (per person per year). - ‘severe_incidence’: Represents severe incidence (per person per year). - ‘simulatedEIR’: Represents simulated entomological inoculation rate (EIR). - ‘n_total_mos_pop’: Represents the total female mosquito population.

Returns:
  • str

    The corresponding y-axis label for the channel if recognized.

  • If the channel is not recognized, the channel name itself is returned.

Source code in plotter\plot_helper.py
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
def get_label(channel):
    """
    Retrieve the label for a given outcome.
    This function returns a formatted string representing the y-axis label
    based on the specified channel name. The labels correspond to specific
    epidemiological measures. If the channel is not recognized, the function
    simply returns the input channel name as-is.

    Args:
        channel (str): The name of the channel for which to retrieve the label.
            Possible values include (but are not limited to):
            - 'prevalence_2to10': Represents $\it{Pf}$PR$_{2-10}$ (%) prevalence.
            - 'prevalence': Represents $\it{Pf}$PR (%) prevalence.
            - 'clinical_incidence': Represents clinical incidence (per person per year).
            - 'severe_incidence': Represents severe incidence (per person per year).
            - 'simulatedEIR': Represents simulated entomological inoculation rate (EIR).
            - 'n_total_mos_pop': Represents the total female mosquito population.

    Returns:
        str: The corresponding y-axis label for the channel if recognized.
        If the channel is not recognized, the channel name itself is returned.
    """

    channel_labels = {'ageGroup': 'Age group',
                      'prevalence_2to10': '$\it{Pf}$PR$_{2-10}$',   # (%) if %, then pfpr outcomes need to be *100
                      'prevalence': '$\it{Pf}$PR',    # (%)
                      'clinical_incidence': 'Clinical incidence (pppy)',
                      'severe_incidence': 'Severe incidence (pppy)',
                      'simulatedEIR': 'simulated EIR',
                      'n_total_mos_pop': 'Total female mosquito population'
                      }

    return channel_labels.get(channel, channel)

get_output_df(wdir, modelname, yr=False, mth=False, daily=False, custom_name=None, save_combined=False)

Load and combine data from the model output files.

This function reads model output files from a specified working directory and combines the data into a single DataFrame. It supports different data formats based on the specified parameters for yearly, monthly, or daily data.

Parameters:
  • wdir (str) –

    Working directory where the data files are located.

  • modelname (str or list of str) –

    Name of models for which result CSVs should be loaded (case sensitive).

  • yr (bool, default: False ) –

    Set to True if the data files have yearly data. Defaults to False.

  • mth (bool, default: False ) –

    Set to True if the data files have monthly data. Defaults to False.

  • daily (bool, default: False ) –

    Set to True if the data files have daily timestep data. Defaults to False. If both mth and daily are True, only daily will be processed.

  • custom_name (str, default: None ) –

    Custom filename to use instead of the default based on the time period. Defaults to None.

  • save_combined (bool, default: False ) –

    Set to True to save the combined DataFrame to a CSV file. Defaults to False.

Returns:
  • tuple

    A tuple containing: - df (DataFrame): Combined DataFrame containing the combined data for the models listed in modelname. - wdir (str): Updated working directory (if applicable).

Raises:
  • ValueError

    If an invalid modelname value is specified.

Source code in plotter\plot_helper.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
def get_output_df(wdir, modelname, yr=False, mth=False, daily=False, custom_name=None,
                  save_combined=False):
    """
    Load and combine data from the model output files.

    This function reads model output files from a specified working directory
    and combines the data into a single DataFrame. It supports different data
    formats based on the specified parameters for yearly, monthly, or daily
    data.

    Args:
        wdir (str): Working directory where the data files are located.
        modelname (str or list of str): Name of models for which result CSVs
                                         should be loaded (case sensitive).
        yr (bool, optional): Set to True if the data files have yearly data.
                             Defaults to False.
        mth (bool, optional): Set to True if the data files have monthly data.
                             Defaults to False.
        daily (bool, optional): Set to True if the data files have daily timestep
                                data. Defaults to False. If both mth and daily
                                are True, only daily will be processed.
        custom_name (str, optional): Custom filename to use instead of the default
                                      based on the time period. Defaults to None.
        save_combined (bool, optional): Set to True to save the combined DataFrame
                                         to a CSV file. Defaults to False.

    Returns:
        tuple: A tuple containing:
            - df (DataFrame): Combined DataFrame containing the combined data
                              for the models listed in modelname.
            - wdir (str): Updated working directory (if applicable).

    Raises:
        ValueError: If an invalid modelname value is specified.
    """

    cols_to_keep = None  # default read all
    fname = 'mmmpy_timeavrg.csv'
    if yr:
        fname = 'mmmpy_yr.csv'
    if mth:
        fname = 'mmmpy_mth.csv'
    if daily:
        fname = 'mmmpy_daily.csv'
        # cols_to_keep = ['index', 'timestep', 'ageGroup', 'simulatedEIR', 'prevalence_2to10', 'prevalence',
        #                'clinical_incidence', 'severe_incidence', 'seed']
    if custom_name:
        fname = f'{custom_name}.csv'

    file_paths = [os.path.join(wdir, fname)]

    for model in modelname:
        file_paths.append(os.path.join(wdir, model, fname))

    existing_files = [path for path in file_paths if os.path.isfile(path)]

    if not existing_files:
        return pd.DataFrame(), wdir

    if os.path.isfile(os.path.join(wdir, fname)):
        df = pd.read_csv((os.path.join(wdir, fname)), low_memory=False)
    else:

        dfs = []
        for model in modelname:
            model_path = os.path.join(wdir, model, fname)
            try:
                if os.path.isfile(model_path):
                    df = pd.read_csv(model_path, usecols=cols_to_keep)
                    df['modelname'] = model
                    if model == 'EMOD':
                        df['seed'] = df['seed'] + 1
                    dfs.append(df)
                else:
                    print(f"File not found for {model}: {model_path}")
            except Exception as e:
                print(f"Error reading {model_path}: {e}")

        if not dfs:
            return pd.DataFrame(), wdir

        df = pd.concat(dfs, ignore_index=True)

        if 'ageGroup' in df.columns:
            try:
                age_grps = sorted(list(df['ageGroup'].unique()), key=custom_sort_key)
            except:
                age_grps = list(df['ageGroup'].unique())
            df['ageGroup'] = df['ageGroup'].astype('category')
            df['ageGroup'] = df['ageGroup'].cat.reorder_categories(age_grps)

        warning_df = df[df['simulatedEIR'] == 0]
        if len(warning_df) > 0 and daily is False:  # we don't want to include simulations were eir was 0 or less, because we won't get any outcome measures and that crashes the system
            print('Warning: some eirs had simulated EIRS of 0, and were removed')
            df = df[df['simulatedEIR'] > 0]
            df = df[df['simulatedEIR'].notnull()]

        if not daily and save_combined:
            df.to_csv(os.path.join(wdir, f'{fname}'), index=False)
    return df, wdir

get_x_y(df, grpvar, x_channel, y_channel)

Calculate x-axis and y-axis values for each plot.

This function groups the input DataFrame by a specified variable and calculates the mean values for the specified x and y channels. It also computes the 95% confidence interval for the y values.

Parameters:
  • df (DataFrame) –

    The DataFrame used to group and calculate x and y values.

  • grpvar (str) –

    The variable in the DataFrame used to group the x and y values.

  • x_channel (str) –

    The variable serving as the x-axis in the graph.

  • y_channel (str) –

    The variable serving as the y-axis in the graph.

Returns:
  • tuple

    A tuple containing: - xmean (DataFrame): A DataFrame containing values for the x-axis. - ymean (DataFrame): A DataFrame containing values for the y-axis, including the 95% confidence interval (min and max).

Source code in plotter\plot_helper.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
def get_x_y(df, grpvar, x_channel, y_channel):
    """
    Calculate x-axis and y-axis values for each plot.

    This function groups the input DataFrame by a specified variable and
    calculates the mean values for the specified x and y channels. It also
    computes the 95% confidence interval for the y values.

    Args:
        df (DataFrame): The DataFrame used to group and calculate x and y values.
        grpvar (str): The variable in the DataFrame used to group the x and y values.
        x_channel (str): The variable serving as the x-axis in the graph.
        y_channel (str): The variable serving as the y-axis in the graph.

    Returns:
        tuple: A tuple containing:
            - xmean (DataFrame): A DataFrame containing values for the x-axis.
            - ymean (DataFrame): A DataFrame containing values for the y-axis,
                                 including the 95% confidence interval (min and max).
    """

    xmean = df.groupby(grpvar)[x_channel].agg(np.mean).reset_index()
    ymean = df.groupby(grpvar)[y_channel].agg(np.mean).reset_index()
    p_df = pd.DataFrame(columns=[grpvar, f'{y_channel}_min', f'{y_channel}_max'])
    for i, row in ymean.iterrows():
        p = df[df[grpvar] == row[grpvar]]
        pmin = np.nanpercentile(p[y_channel], 2.5, axis=0)
        pmax = np.nanpercentile(p[y_channel], 97.5, axis=0)
        new_row = pd.DataFrame([{grpvar: row[grpvar], f'{y_channel}_min': pmin, f'{y_channel}_max': pmax}])
        if not new_row.empty and not new_row.isna().all(axis=None):
            p_df = pd.concat([p_df, new_row], axis=0, ignore_index=True)
    ymean = pd.merge(left=ymean, right=p_df, on=grpvar)
    return xmean, ymean

input_to_simulated_eir(fdir, df, sweepvar='cm_clinical', facet_var='seasonality')

Generate line plots comparing input EIR to simulated EIR.

This function creates line plots where the x-axis represents the input EIR values and the y-axis represents the simulated annual EIR. Different models are represented by different lines on the plot, and the plots are organized into panels based on specified facets.

Parameters:
  • fdir (str) –

    Directory where the generated plot will be saved.

  • df (DataFrame) –

    DataFrame that includes combined model results.

  • sweepvar (str, default: 'cm_clinical' ) –

    Variable to group the data and create multiple lines on the plot. Default is ‘cm_clinical’.

  • facet_var (str, default: 'seasonality' ) –

    Variable to create multiple panels on the plot. Default is ‘seasonality’.

Returns:
  • None

    The function saves the generated plots to disk.

Source code in plotter\plot_relationship.py
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
def input_to_simulated_eir(fdir, df, sweepvar='cm_clinical', facet_var='seasonality'):
    """
    Generate line plots comparing input EIR to simulated EIR.

    This function creates line plots where the x-axis represents the input EIR values
    and the y-axis represents the simulated annual EIR. Different models are represented
    by different lines on the plot, and the plots are organized into panels based on
    specified facets.

    Args:
        fdir (str): Directory where the generated plot will be saved.
        df (pd.DataFrame): DataFrame that includes combined model results.
        sweepvar (str, optional): Variable to group the data and create multiple lines
            on the plot. Default is 'cm_clinical'.
        facet_var (str, optional): Variable to create multiple panels on the plot.
            Default is 'seasonality'.

    Returns:
        None: The function saves the generated plots to disk.
    """

    if sweepvar != 'modelname' and facet_var != 'modelname':
        print('Either sweepvar or facet_var should be modelname')
        return

    xyvars = ['simulatedEIR', 'target_output_values']
    figure_vars = xyvars + [sweepvar, facet_var]
    df, caption_txt = subset_dataframe_for_plot(df, figure_vars)
    unique_facets = sorted_list(df[facet_var])
    unique_groups = sorted_list(df[sweepvar])

    num_colors = len(unique_groups)
    color_palette = sns.color_palette('colorblind', max(num_colors, 4))
    firstPlot = True
    nx = 1
    ny = max(1, len(unique_facets))
    f = 1
    fig = plt.figure(figsize=(10 * nx, 6 * ny))
    for fi in unique_facets:
        fdf = df[df[facet_var] == fi]

        ax = fig.add_subplot(ny, nx, f)
        ax.set_title(fi)
        f += 1
        for i, (s, sdf) in enumerate(fdf.groupby([sweepvar])):
            s = s[0]
            # plot mean, min and max of seeds and age groups
            color_key = color_selector(i, s)
            if sweepvar == 'modelname':
                xchannel = f'transmission_intensity_{s}'
            else:
                xchannel = f'transmission_intensity_{fi}'

            xmean, ymean = get_x_y(sdf, 'target_output_values', xchannel, 'simulatedEIR')
            merge_df = pd.merge(left=xmean, right=ymean, on='target_output_values')
            merge_df.sort_values(by=xchannel, inplace=True)
            ax.plot(merge_df[xchannel], merge_df['simulatedEIR'], '-', linewidth=0.8, label=f"{s}",
                    color=color_palette[color_key])
            ax.fill_between(merge_df[xchannel], merge_df['simulatedEIR_min'],
                            merge_df['simulatedEIR_max'], alpha=0.1, color=color_palette[color_key])
        if firstPlot:
            lg = ax.legend(loc='upper left', bbox_to_anchor=(0, 1))
            firstPlot = False
        ax.set_ylim(0.1, 10000)
        ax.set_xlim(0.1, 10000)
        ax.set_yscale('symlog')
        ax.set_xscale('symlog')

    plt.xlabel('Input EIR')
    plt.ylabel('Simulated annual EIR')

    fname = f'input_to_simulated_eir_{sweepvar}_{facet_var}'
    fname = clean_fname(fname, sweepvar, unique_groups, facet_var, unique_facets)

    fig.savefig(os.path.join(fdir, f'{fname}.png'), bbox_extra_artists=(lg,), bbox_inches='tight')
    plt.close()

load_exp(wdir)

Load experiment setup and scenario data into an Exp object.

wdir (str): The working directory containing ‘exp_setup_df.csv’, ‘scenarios.csv’, and optionally ‘exp.obj’.

Exp: An object with attributes set from ‘exp.obj’, or dynamically built from ‘exp_setup_df.csv’ and ‘scenarios.csv’.

Source code in plotter\plot_helper.py
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def load_exp(wdir):
    """
    Load experiment setup and scenario data into an Exp object.

    Parameters:
    wdir (str): The working directory containing 'exp_setup_df.csv', 'scenarios.csv',
                and optionally 'exp.obj'.

    Returns:
    Exp: An object with attributes set from 'exp.obj', or dynamically built
         from 'exp_setup_df.csv' and 'scenarios.csv'.
    """
    try:
        # Attempt to load the Exp object from a pickle file
        with open(os.path.join(wdir, "exp.obj"), "rb") as file:
            exp = pickle.load(file)
    except (FileNotFoundError, pickle.UnpicklingError) as e:
        # If the pickle file doesn't exist or is corrupted, build the object from CSV files
        class Exp:
            pass

        exp_setup_file = os.path.join(wdir, 'exp_setup_df.csv')
        scen_file = os.path.join(wdir, 'scenarios.csv')

        # Check if the required CSV files exist
        if not os.path.exists(exp_setup_file) or not os.path.exists(scen_file):
            raise FileNotFoundError("Required files 'exp_setup_df.csv' and 'scenarios.csv' are missing." )

        # Load data from CSV files
        exp_setup_df = pd.read_csv(exp_setup_file)
        scen_df = pd.read_csv(scen_file)

        # Create an instance of Exp
        exp = Exp()

        # Set attributes from exp_setup_df
        for _, row in exp_setup_df.iterrows():
            setattr(exp, row["parameter"], row["Value"])

        # Set attributes from scen_df
        for col in scen_df.columns:
            setattr(exp, col, scen_df[col].values)

    return exp

parse_args()

Parses command-line arguments for simulation specifications.

This function uses the argparse library to handle command-line inputs required for running simulation experiments. It defines required and optional arguments, including the job directory and model names.

Returns:
  • argparse.Namespace: An object containing the parsed command-line arguments.

Command Line Arguments

-d/–directory (str): The job directory where the exp.obj file is located. This argument is required. -m/–modelname (str): One or more model names to compare. This argument is optional and defaults to [‘EMOD’, ‘OpenMalaria’, ‘malariasimulation’].

Source code in plotter\plot_helper.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def parse_args():
    """
    Parses command-line arguments for simulation specifications.

    This function uses the argparse library to handle command-line inputs
    required for running simulation experiments. It defines required and optional
    arguments, including the job directory and model names.

    Returns:
        argparse.Namespace: An object containing the parsed command-line arguments.

    Command Line Arguments:
        -d/--directory (str): The job directory where the exp.obj file is located. This argument is required.
        -m/--modelname (str): One or more model names to compare. This argument is optional
                              and defaults to ['EMOD', 'OpenMalaria', 'malariasimulation'].
    """

    description = "Simulation specifications"
    parser = argparse.ArgumentParser(description=description)

    parser.add_argument(
        "-d",
        "--directory",
        type=str,
        required=True,
        help="Job Directory where exp.obj is located",
    )
    parser.add_argument(
        "-m",
        "--modelname",
        nargs='+',
        type=str,
        required=False,
        help="Name of models to compare",
        default=['EMOD', 'OpenMalaria', 'malariasimulation']
    )

    return parser.parse_args()

prevalence2to10_to_outcome(fdir, df, sweepvar='modelname', facet_var='seasonality', channel='clinical_incidence', agegrps=None)

Generate line plots for PfPR2to10 and either clinical or severe incidence, grouped by the specified sweep variable and faceted by another variable.

This function creates a series of line plots where the x-axis represents the prevalence of PfPR2to10, and the y-axis represents either clinical or severe incidence. Each line corresponds to a model, and the plots can be faceted by a specified variable (e.g., seasonality).

Parameters:
  • fdir (str) –

    Directory where the generated plot will be saved.

  • df (DataFrame) –

    DataFrame containing the model results for plotting.

  • sweepvar (str, default: 'modelname' ) –

    Variable to group the data and create multiple panels on the plot. Default is ‘modelname’.

  • facet_var (str, default: 'seasonality' ) –

    Variable used to create subplots based on its unique values. Default is ‘seasonality’.

  • channel (str, default: 'clinical_incidence' ) –

    Variable representing the y-axis data to be plotted (e.g., ‘clinical_incidence’).

  • agegrps (list of str, default: None ) –

    Specific age groups to filter the DataFrame before plotting. Defaults to None, which means all age groups will be used.

Returns:
  • None

    The function saves the generated plots to disk.

Raises:
  • ValueError

    If an error occurs while filtering the DataFrame or during plotting.

Source code in plotter\plot_relationship.py
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
def prevalence2to10_to_outcome(fdir, df, sweepvar='modelname', facet_var='seasonality', channel='clinical_incidence',
                               agegrps=None):
    """
    Generate line plots for PfPR2to10 and either clinical or severe incidence,
    grouped by the specified sweep variable and faceted by another variable.

    This function creates a series of line plots where the x-axis represents
    the prevalence of PfPR2to10, and the y-axis represents either clinical or
    severe incidence. Each line corresponds to a model, and the plots can be
    faceted by a specified variable (e.g., seasonality).

    Args:
        fdir (str): Directory where the generated plot will be saved.
        df (pd.DataFrame): DataFrame containing the model results for plotting.
        sweepvar (str, optional): Variable to group the data and create multiple panels
            on the plot. Default is 'modelname'.
        facet_var (str, optional): Variable used to create subplots based on its unique
            values. Default is 'seasonality'.
        channel (str): Variable representing the y-axis data to be plotted (e.g.,
            'clinical_incidence').
        agegrps (list of str, optional): Specific age groups to filter the DataFrame
            before plotting. Defaults to None, which means all age groups will be used.

    Returns:
        None: The function saves the generated plots to disk.

    Raises:
        ValueError: If an error occurs while filtering the DataFrame or during plotting.
    """

    figure_vars = ['prevalence_2to10', channel] + [sweepvar, facet_var]
    df, caption_txt = subset_dataframe_for_plot(df, figure_vars, agegrps)

    unique_facets = sorted_list(df[facet_var])
    unique_groups = sorted_list(df[sweepvar])
    num_colors = len(unique_groups)
    color_palette = sns.color_palette('colorblind', max(num_colors, 4))

    nx = max(1,len(df['ageGroup'].unique()))
    ny = 1
    f = 1
    firstPlot = True
    fig = plt.figure(figsize=(10 * nx, 10 * ny))

    if agegrps is None:
        agegrps = df['ageGroup'].unique()

    # for each age group, plot x = prevalence 2_to_10, y = incidences (or prevalence)
    for fi in unique_facets:
        fdf = df[df[facet_var] == fi]

        for agegrp in agegrps:
            ax = fig.add_subplot(ny, nx, f)
            ax.set_title(f'{fi}, Age group: {agegrp}')
            if len(unique_facets) == 1:
                ax.set_title(f'Age group: {agegrp}')
            f += 1
            adf = fdf[fdf.ageGroup == agegrp]

            for mi, grp in enumerate(unique_groups):
                pdf = adf[adf[sweepvar] == grp]
                color_key = color_selector(mi, grp)
                if not pd.isnull(pdf['prevalence_2to10']).all():
                    # plot mean, min and max of seeds
                    xmean, ymean = get_x_y(pdf, 'target_output_values', 'prevalence_2to10', channel)
                    merge_df = pd.merge(left=xmean, right=ymean, on='target_output_values')
                    if channel == 'simulatedEIR':
                        merge_df.sort_values(by='target_output_values', inplace=True)
                    else:
                        merge_df.sort_values(by='prevalence_2to10', inplace=True)
                    ax.plot(merge_df['prevalence_2to10'], merge_df[channel], label=grp,
                            color=color_palette[color_key])
                    ax.fill_between(merge_df['prevalence_2to10'], merge_df[f'{channel}_min'],
                                    merge_df[f'{channel}_max'], alpha=0.1, color=color_palette[color_key])
            if channel == "prevalence":
                y_lim = 1
            else:
                y_lim = max(adf[channel]) * 1.1
            ax.set_ylim(0, y_lim)
            ax.set_xlim(0, 1)
            ax.set_xticks([0, 0.2, 0.4, 0.6, 0.8, 1])
            ax.set_ylabel(get_label(channel))
            ax.set_xlabel('$\it{Pf}$PR$_{2-10}$ (%)')
            if firstPlot:
                lg = ax.legend(loc='upper left', bbox_to_anchor=(0.5, 1.2))
                firstPlot = False

    fname = f'pfpr2to10_to_{channel}_{sweepvar}_{facet_var}'
    fname = clean_fname(fname, sweepvar, unique_groups, facet_var, unique_facets)

    fig.savefig(os.path.join(fdir, f'{fname}.png'), bbox_extra_artists=(lg,), bbox_inches='tight')
    plt.close()

prevalence2to10_to_outcome_by_age_and_model(fdir, df, channel='clinical_incidence', season=None)

Generate grid plots of PfPR2to10 prevalence vs. clinical or severe incidence by age and model.

This function creates a grid of line plots where
  • The x-axis represents the prevalence of PfPR2to10.
  • The y-axis represents either clinical or severe incidence (as specified by the channel argument).
  • Rows correspond to different age groups.
  • Columns correspond to different models.

The function aggregates data for each model and age group, computes mean and confidence intervals (min-max range), and plots these statistics with confidence bands.

Parameters:
  • fdir (str) –

    Path to the directory where the resulting plot will be saved.

  • df (DataFrame) –

    DataFrame containing the input data. Must include columns for model name, age group, prevalence, and incidence metrics.

  • channel (str, default: 'clinical_incidence' ) –

    The y-axis variable to plot (e.g., ‘clinical_incidence’, ‘severe_incidence’). Defaults to ‘clinical_incidence’.

  • season (str, default: None ) –

    Seasonality condition to filter the data. If None, the first unique seasonality value in the dataset will be used. Defaults to None.

Returns:
  • None

    The function generates and saves a plot as a PNG file in the specified directory.

Notes
  • Age groups are categorized into infants and young children (0-5 years), older children (5-15 years), and adults (>15 years).
  • Models are distinguished using unique colors.
  • The function requires helper functions for selecting colors (color_selector) and for data aggregation (get_x_y).
Source code in plotter\plot_relationship.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
def prevalence2to10_to_outcome_by_age_and_model(fdir, df, channel='clinical_incidence', season=None):
    """
    Generate grid plots of PfPR2to10 prevalence vs. clinical or severe incidence by age and model.

    This function creates a grid of line plots where:
      - The x-axis represents the prevalence of PfPR2to10.
      - The y-axis represents either clinical or severe incidence (as specified by the `channel` argument).
      - Rows correspond to different age groups.
      - Columns correspond to different models.

    The function aggregates data for each model and age group, computes mean and confidence intervals
    (min-max range), and plots these statistics with confidence bands.

    Args:
        fdir (str): Path to the directory where the resulting plot will be saved.
        df (pd.DataFrame): DataFrame containing the input data. Must include columns for model name,
            age group, prevalence, and incidence metrics.
        channel (str): The y-axis variable to plot (e.g., 'clinical_incidence', 'severe_incidence').
            Defaults to 'clinical_incidence'.
        season (str, optional): Seasonality condition to filter the data. If None, the first unique
            seasonality value in the dataset will be used. Defaults to None.

    Returns:
        None: The function generates and saves a plot as a PNG file in the specified directory.

    Notes:
        - Age groups are categorized into infants and young children (0-5 years), older children
          (5-15 years), and adults (>15 years).
        - Models are distinguished using unique colors.
        - The function requires helper functions for selecting colors (`color_selector`) and for
          data aggregation (`get_x_y`).
    """

    unique_groups = sorted_list(df['modelname'])

    num_colors = len(unique_groups)
    color_palette = sns.color_palette('colorblind', max(num_colors, 4))

    # Define the age categories
    age_categories = {
        "young_child": {"min": 0, "max": 5},
        "older_child": {"min": 5, "max": 15},
        "adult": {"min": 15, "max": float('inf')}
    }
    ages_dict = {
        "young_child": [],
        "older_child": [],
        "adult": []
    }
    ages_labels = ['Infants and young\nchildren(0-5 yrs)', 'Older children\n(5-15 yrs)', 'Adults\n(>15 yrs)']

    # Assign each age range to the appropriate category
    age_ranges = [[start, end] for start, end in (item.split('-') for item in list(df['ageGroup'].unique()))]
    for age_range in age_ranges:
        category = assign_age_group(age_range, age_categories)
        if category:
            age_range_str = f"{age_range[0]}-{age_range[1]}"
            ages_dict[category].append(age_range_str)
        else:
            print(f"Age range {age_range} does not fit any category.")

    mean_vars = [channel, 'prevalence_2to10']

    nx = max(1, len(modelname))
    ny = max(1,len(ages_dict))
    f = 1
    fig = plt.figure(figsize=(10 * nx, 6 * ny))

    # for each age group, plot x = prevalence 2_to_10, y = incidences (or prevalence)
    if season is None:
        season = df.seasonality.unique()[0]
    sdf = df[df.seasonality == season]

    for a, ages in enumerate(ages_dict.values()):
        adf = sdf[sdf['ageGroup'].isin(ages)]

        for mi, grp in enumerate(unique_groups):
            color_key = color_selector(mi, grp)
            ax = fig.add_subplot(ny, nx, f)
            ax.title.set_text(f'{grp}')

            f += 1
            mdf = adf[adf['modelname'] == grp]
            # mdf = mdf.groupby(['seed', 'target_output_values'])[mean_vars].agg(np.mean).reset_index()
            # Population weighted mean for each target_output_values (relevant for incidence)
            mdf = mdf.groupby(['seed', 'target_output_values'])[mean_vars + ['nHost']].apply(
                lambda x: pd.Series({col: np.average(x[col], weights=x['nHost']) for col in mean_vars})).reset_index()

            if not pd.isnull(mdf['prevalence_2to10']).all():
                # plot mean, min and max of seeds
                xmean, ymean = get_x_y(mdf, 'target_output_values', 'prevalence_2to10', channel)
                merge_df = pd.merge(left=xmean, right=ymean, on='target_output_values')
                merge_df.sort_values(by='prevalence_2to10', inplace=True)
                ax.plot(merge_df['prevalence_2to10'], merge_df[channel], label=grp,
                        color=color_palette[color_key])
                ax.fill_between(merge_df['prevalence_2to10'], merge_df[f'{channel}_min'],
                                merge_df[f'{channel}_max'], alpha=0.1, color=color_palette[color_key])
            if channel == "prevalence":
                y_lim = 1
            else:
                y_lim = max(adf[channel]) * 1.01
            ax.set_ylim(0, y_lim)
            ax.set_xlim(0, 1)
            ax.set_xticks([0, 0.20, 0.40, 0.60, 0.80, 1])
            ax.set_ylabel(f'{ages_labels[a]}\n,{get_label(channel)}')
            ax.set_xlabel('$\it{Pf}$PR$_{2-10}$')

    fname = f'pfpr2to10_to_{channel}_age_model_{season}'
    fig.savefig(os.path.join(fdir, f'{fname}.png'), bbox_inches='tight')
    plt.close()

subset_dataframe_for_plot(df, figure_vars, agegrps=None, filter_target=True)

Filter the input DataFrame for plotting based on specified criteria.

This function filters the DataFrame according to the provided figure variables, optional age groups, and other selection criteria to prepare the data for visualization. It also returns a string summarizing the filtering applied.

Parameters:
  • df (DataFrame) –

    The input DataFrame containing simulation results.

  • figure_vars (list of str) –

    List of variables used for plotting, which influences the filtering process.

  • agegrps (str or list of str, default: None ) –

    Specific age group(s) to filter by. If provided, only the data for these age groups will be retained. Defaults to None, meaning no filtering by age group will occur.

  • filter_target (bool, default: True ) –

    If True, the function will filter the DataFrame to retain the maximum output target value if certain variables are not present in figure_vars. Defaults to True.

Returns:
  • tuple

    A tuple containing: - pd.DataFrame: The filtered DataFrame. - str: A summary string describing the filtering that was applied.

Raises:
  • ValueError

    If ‘modelname’ is not included in figure_vars and there

Source code in plotter\plot_helper.py
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
def subset_dataframe_for_plot(df, figure_vars, agegrps=None, filter_target=True):
    """
    Filter the input DataFrame for plotting based on specified criteria.

    This function filters the DataFrame according to the provided figure variables,
    optional age groups, and other selection criteria to prepare the data for
    visualization. It also returns a string summarizing the filtering applied.

    Args:
        df (pd.DataFrame): The input DataFrame containing simulation results.
        figure_vars (list of str): List of variables used for plotting, which influences
            the filtering process.
        agegrps (str or list of str, optional): Specific age group(s) to filter by.
            If provided, only the data for these age groups will be retained.
            Defaults to None, meaning no filtering by age group will occur.
        filter_target (bool, optional): If True, the function will filter the DataFrame
            to retain the maximum output target value if certain variables are not
            present in `figure_vars`. Defaults to True.

    Returns:
        tuple: A tuple containing:
            - pd.DataFrame: The filtered DataFrame.
            - str: A summary string describing the filtering that was applied.

    Raises:
        ValueError: If 'modelname' is not included in `figure_vars` and there
        are multiple unique model names in the DataFrame.
    """

    txt = 'Filtered dataset by: '

    if agegrps is not None:
        if isinstance(agegrps, list):
            df = df[df['ageGroup'].isin(agegrps)]
            txt += f'ageGroup in {agegrps}, '
        else:
            df = df[df['ageGroup'] == agegrps]
            txt += f'ageGroup {agegrps}, '


    if 'cm_clinical' not in figure_vars:
        selected_cm = df['cm_clinical'].min()
        df = df[df['cm_clinical'] == selected_cm]
        txt += f'cm_clinical {selected_cm}, '

    if 'seasonality' not in figure_vars:
        selected_season = 'seasonal' if 'seasonal' in df['seasonality'].unique() else df['seasonality'].unique()[0]
        df = df[df['seasonality'] == selected_season]
        txt += f'seasonality {selected_season}, '

    if filter_target:
      if not any(var in figure_vars for var in ['simulatedEIR', 'prevalence_2to10', 'target_output_values']) :
          selected_output = df['target_output_values'].max()
          df = df[df['target_output_values'] == selected_output]
          txt += f'target_output_values {selected_output}, '

    if 'modelname' not in figure_vars and df['modelname'].nunique() > 1:
        raise ValueError('modelname needs to be specified in plot if results were combined for more than 1 model')

    # Remove trailing comma and space if any filtering has been done
    if txt.endswith(', '):
        txt = txt[:-2]

    return df, txt