plot_relationship.py

color_selector(i, s)

Select a color index based on the model name.

This function returns a color index based on the specified model name. If the model name is recognized, a predefined index is returned; otherwise, the input index is returned.

Parameters:
  • i (int) –

    The default index to return if the model name is not recognized.

  • s (str) –

    The name of the model. Possible values include: - ‘EMOD’ - ‘malariasimulation’ - ‘OpenMalaria’

Returns:
  • int

    The color index corresponding to the model name.

Source code in plotter\plot_helper.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
def color_selector(i, s):
    """
    Select a color index based on the model name.

    This function returns a color index based on the specified model name.
    If the model name is recognized, a predefined index is returned;
    otherwise, the input index is returned.

    Args:
        i (int): The default index to return if the model name is not recognized.
        s (str): The name of the model. Possible values include:
            - 'EMOD'
            - 'malariasimulation'
            - 'OpenMalaria'

    Returns:
        int: The color index corresponding to the model name.
    """

    if s == 'EMOD':
        return 0
    elif s == 'malariasimulation':
        return 1
    elif s == 'OpenMalaria':
        return 2
    else:
        return i

convert_to_date(x)

Convert a number of days since January 1, 2005, to a date.

This function takes an integer representing the number of days since January 1, 2005, and returns the corresponding date.

Parameters:
  • x (int) –

    The number of days since January 1, 2005.

Returns:
  • date

    A datetime.date object representing the corresponding date.

Source code in plotter\plot_helper.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
def convert_to_date(x):
    """
    Convert a number of days since January 1, 2005, to a date.

    This function takes an integer representing the number of days
    since January 1, 2005, and returns the corresponding date.

    Args:
        x (int): The number of days since January 1, 2005.

    Returns:
        date: A datetime.date object representing the corresponding date.
    """

    import datetime
    return datetime.date(2005, 1, 1) + datetime.timedelta(days=x)

custom_sort_key(age_group)

Custom sort key function for sorting age groups.

This function extracts the lower bound of an age group represented as a string in the format ‘X-Y’ and returns it as an integer. It is primarily used for sorting age groups in ascending order based on their lower bounds.

Parameters:
  • age_group (str) –

    The age group string in the format ‘X-Y’, where X is the lower bound and Y is the upper bound.

Returns:
  • int

    The lower bound of the age group as an integer.

Source code in plotter\plot_helper.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def custom_sort_key(age_group):
    """
    Custom sort key function for sorting age groups.

    This function extracts the lower bound of an age group represented as
    a string in the format 'X-Y' and returns it as an integer. It is
    primarily used for sorting age groups in ascending order based on
    their lower bounds.

    Args:
        age_group (str): The age group string in the format 'X-Y',
                         where X is the lower bound and Y is the upper bound.

    Returns:
        int: The lower bound of the age group as an integer.
    """

    return int(age_group.split('-')[0])

eir_to_outcome(fdir, df, sweepvar='cm_clinical', facet_var='seasonality', eir_val='simulatedEIR', channel='prevalence_2to10', agegrp='0-5')

Generate line plots for EIR (Entomological Inoculation Rate) and a requested outcome variable, with models represented as colors and sweep variables as panels.

This function creates line plots where the x-axis represents the EIR and the y-axis represents an outcome variable, with different models indicated by color and organized into panels based on specified facets.

Parameters:
  • fdir (str) –

    Directory where the generated plot will be saved.

  • df (DataFrame) –

    DataFrame that includes combined model results.

  • sweepvar (str, default: 'cm_clinical' ) –

    Variable to group the data and create multiple panels on the plot. Default is ‘cm_clinical’.

  • eir_val (str, default: 'simulatedEIR' ) –

    The EIR (Entomological Inoculation Rate) value to be used for plotting. Default is ‘simulatedEIR’.

  • channel (str, default: 'prevalence_2to10' ) –

    The outcome variable to compare to EIR. Default is ‘prevalence_2to10’.

  • agegrp (str, default: '0-5' ) –

    Limits the resulting graphs to the selected age group. Default is ‘0-5’.

Returns:
  • None

    The function saves the generated plots to disk.

Source code in plotter\plot_relationship.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
def eir_to_outcome(fdir, df, sweepvar='cm_clinical', facet_var='seasonality', eir_val='simulatedEIR',
                   channel='prevalence_2to10', agegrp='0-5'):
    """
    Generate line plots for EIR (Entomological Inoculation Rate) and a requested
    outcome variable, with models represented as colors and sweep variables as panels.

    This function creates line plots where the x-axis represents the EIR and the
    y-axis represents an outcome variable, with different models indicated by color
    and organized into panels based on specified facets.

    Args:
        fdir (str): Directory where the generated plot will be saved.
        df (pd.DataFrame): DataFrame that includes combined model results.
        sweepvar (str, optional): Variable to group the data and create multiple
            panels on the plot. Default is 'cm_clinical'.
        eir_val (str, optional): The EIR (Entomological Inoculation Rate) value to
            be used for plotting. Default is 'simulatedEIR'.
        channel (str, optional): The outcome variable to compare to EIR. Default is
            'prevalence_2to10'.
        agegrp (str, optional): Limits the resulting graphs to the selected age group.
            Default is '0-5'.

    Returns:
        None: The function saves the generated plots to disk.
    """

    figure_vars = [eir_val, channel] + [sweepvar, facet_var]
    df, caption_txt = subset_dataframe_for_plot(df, figure_vars, agegrp)
    color_palette = sns.color_palette('colorblind', max(len(df[sweepvar].unique()), 4))

    firstPlot = True
    unique_facets = df[facet_var].unique()
    nx = len(df[sweepvar].unique())
    ny = len(unique_facets)
    f = 1
    fig = plt.figure(figsize=(10 * nx, 10 * ny))

    for fi in unique_facets:
        ax = fig.add_subplot(ny, nx, f)
        ax.set_title(fi)
        f += 1
        fdf = df[(df[facet_var] == fi)]

        for i, (p, pdf) in enumerate(fdf.groupby([sweepvar])):
            color_key = color_selector(i, p)
            xmean, ymean = get_x_y(pdf, 'output_target', eir_val, channel)
            merge_df = pd.merge(left=xmean, right=ymean, on='output_target')
            merge_df.sort_values(by=eir_val, inplace=True)
            ax.plot(merge_df[eir_val], merge_df[channel], '-', linewidth=0.8, label=f"{p}",
                    color=color_palette[color_key])
            ax.fill_between(merge_df[eir_val], merge_df[f'{channel}_min'], merge_df[f'{channel}_max'], alpha=0.1,
                            color=color_palette[color_key])
        if firstPlot:
            lg = ax.legend(loc='upper left', bbox_to_anchor=(0, 1))
            firstPlot = False
        ax.set_xlim(0.1, 1000)
        ax.set_xscale('symlog')
        ax.set_xlabel(f'{eir_val.replace("EIR", " annual EIR")}', fontsize=14)
        ax.set_ylabel(get_ylab(channel))

    fname = f'{eir_val}_{channel}_{agegrp}_by_{sweepvar}_{facet_var}'
    if channel == 'prevalence_2to10':
        fname = f'{eir_val}_{channel}_by_{sweepvar}_{facet_var}'
    fname = fname.replace('modelname', 'model')

    fig.savefig(os.path.join(fdir, f'{fname}.png'), bbox_extra_artists=(lg,), bbox_inches='tight')
    plt.close()

get_output_df(wdir, modelname, yr=False, mth=False, daily=False, custom_name=None, save_combined=False)

Load and combine data from the model output files.

This function reads model output files from a specified working directory and combines the data into a single DataFrame. It supports different data formats based on the specified parameters for yearly, monthly, or daily data.

Parameters:
  • wdir (str) –

    Working directory where the data files are located.

  • modelname (str or list of str) –

    Name of models for which result CSVs should be loaded (case sensitive).

  • yr (bool, default: False ) –

    Set to True if the data files have yearly data. Defaults to False.

  • mth (bool, default: False ) –

    Set to True if the data files have monthly data. Defaults to False.

  • daily (bool, default: False ) –

    Set to True if the data files have daily timestep data. Defaults to False. If both mth and daily are True, only daily will be processed.

  • custom_name (str, default: None ) –

    Custom filename to use instead of the default based on the time period. Defaults to None.

  • save_combined (bool, default: False ) –

    Set to True to save the combined DataFrame to a CSV file. Defaults to False.

Returns:
  • tuple

    A tuple containing: - df (DataFrame): Combined DataFrame containing the combined data for the models listed in modelname. - wdir (str): Updated working directory (if applicable).

Raises:
  • ValueError

    If an invalid modelname value is specified.

Source code in plotter\plot_helper.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
def get_output_df(wdir, modelname, yr=False, mth=False, daily=False, custom_name=None,
                  save_combined=False):
    """
    Load and combine data from the model output files.

    This function reads model output files from a specified working directory
    and combines the data into a single DataFrame. It supports different data
    formats based on the specified parameters for yearly, monthly, or daily
    data.

    Args:
        wdir (str): Working directory where the data files are located.
        modelname (str or list of str): Name of models for which result CSVs
                                         should be loaded (case sensitive).
        yr (bool, optional): Set to True if the data files have yearly data.
                             Defaults to False.
        mth (bool, optional): Set to True if the data files have monthly data.
                             Defaults to False.
        daily (bool, optional): Set to True if the data files have daily timestep
                                data. Defaults to False. If both mth and daily
                                are True, only daily will be processed.
        custom_name (str, optional): Custom filename to use instead of the default
                                      based on the time period. Defaults to None.
        save_combined (bool, optional): Set to True to save the combined DataFrame
                                         to a CSV file. Defaults to False.

    Returns:
        tuple: A tuple containing:
            - df (DataFrame): Combined DataFrame containing the combined data
                              for the models listed in modelname.
            - wdir (str): Updated working directory (if applicable).

    Raises:
        ValueError: If an invalid modelname value is specified.
    """

    cols_to_keep = None  # default read all
    fname = 'mmmpy_timeavrg.csv'
    if yr:
        fname = 'mmmpy_yr.csv'
    if mth:
        fname = 'mmmpy_mth.csv'
    if daily:
        fname = 'mmmpy_daily.csv'
        # cols_to_keep = ['index', 'timestep', 'ageGroup', 'simulatedEIR', 'prevalence_2to10', 'prevalence',
        #                'clinical_incidence', 'severe_incidence', 'seed']
    if custom_name:
        fname = f'{custom_name}.csv'

    file_paths = [os.path.join(wdir, fname)]

    for model in modelname:
        file_paths.append(os.path.join(wdir, model, fname))

    existing_files = [path for path in file_paths if os.path.isfile(path)]

    if not existing_files:
        return pd.DataFrame(), wdir

    if os.path.isfile(os.path.join(wdir, fname)):
        df = pd.read_csv((os.path.join(wdir, fname)), low_memory=False)
    else:

        dfs = []
        for model in modelname:
            model_path = os.path.join(wdir, model, fname)
            try:
                if os.path.isfile(model_path):
                    df = pd.read_csv(model_path, usecols=cols_to_keep)
                    df['modelname'] = model
                    if model == 'EMOD':
                        df['seed'] = df['seed'] + 1
                    dfs.append(df)
                else:
                    print(f"File not found for {model}: {model_path}")
            except Exception as e:
                print(f"Error reading {model_path}: {e}")

        if not dfs:
            return pd.DataFrame(), wdir

        df = pd.concat(dfs, ignore_index=True)

        if 'ageGroup' in df.columns:
            try:
                age_grps = sorted(list(df['ageGroup'].unique()), key=custom_sort_key)
            except:
                age_grps = list(df['ageGroup'].unique())
            df['ageGroup'] = df['ageGroup'].astype('category')
            df['ageGroup'] = df['ageGroup'].cat.reorder_categories(age_grps)

        warning_df = df[df['simulatedEIR'] == 0]
        if len(warning_df) > 0 and daily is False:  # we don't want to include simulations were eir was 0 or less, because we won't get any outcome measures and that crashes the system
            print('Warning: some eirs had simulated EIRS of 0, and were removed')
            df = df[df['simulatedEIR'] > 0]
            df = df[df['simulatedEIR'].notnull()]

        if not daily and save_combined:
            df.to_csv(os.path.join(wdir, f'{fname}'), index=False)
    return df, wdir

get_x_y(df, grpvar, x_channel, y_channel)

Calculate x-axis and y-axis values for each plot.

This function groups the input DataFrame by a specified variable and calculates the mean values for the specified x and y channels. It also computes the 95% confidence interval for the y values.

Parameters:
  • df (DataFrame) –

    The DataFrame used to group and calculate x and y values.

  • grpvar (str) –

    The variable in the DataFrame used to group the x and y values.

  • x_channel (str) –

    The variable serving as the x-axis in the graph.

  • y_channel (str) –

    The variable serving as the y-axis in the graph.

Returns:
  • tuple

    A tuple containing: - xmean (DataFrame): A DataFrame containing values for the x-axis. - ymean (DataFrame): A DataFrame containing values for the y-axis, including the 95% confidence interval (min and max).

Source code in plotter\plot_helper.py
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
def get_x_y(df, grpvar, x_channel, y_channel):
    """
    Calculate x-axis and y-axis values for each plot.

    This function groups the input DataFrame by a specified variable and
    calculates the mean values for the specified x and y channels. It also
    computes the 95% confidence interval for the y values.

    Args:
        df (DataFrame): The DataFrame used to group and calculate x and y values.
        grpvar (str): The variable in the DataFrame used to group the x and y values.
        x_channel (str): The variable serving as the x-axis in the graph.
        y_channel (str): The variable serving as the y-axis in the graph.

    Returns:
        tuple: A tuple containing:
            - xmean (DataFrame): A DataFrame containing values for the x-axis.
            - ymean (DataFrame): A DataFrame containing values for the y-axis,
                                 including the 95% confidence interval (min and max).
    """

    xmean = df.groupby(grpvar)[x_channel].agg(np.mean).reset_index()
    ymean = df.groupby(grpvar)[y_channel].agg(np.mean).reset_index()
    p_df = pd.DataFrame(columns=[grpvar, f'{y_channel}_min', f'{y_channel}_max'])
    for i, row in ymean.iterrows():
        p = df[df[grpvar] == row[grpvar]]
        pmin = np.nanpercentile(p[y_channel], 2.5, axis=0)
        pmax = np.nanpercentile(p[y_channel], 97.5, axis=0)
        new_row = pd.DataFrame([{grpvar: row[grpvar], f'{y_channel}_min': pmin, f'{y_channel}_max': pmax}])
        p_df = pd.concat([p_df, new_row], axis=0, ignore_index=True)
    ymean = pd.merge(left=ymean, right=p_df, on=grpvar)
    return xmean, ymean

get_ylab(channel)

Retrieve the y-axis label for a given channel.

This function returns a formatted string representing the y-axis label based on the specified channel name. The labels correspond to specific epidemiological measures.

Parameters:
  • channel (str) –

    The name of the channel for which to retrieve the label. Possible values include: - ‘prevalence_2to10’ - ‘prevalence’ - ‘clinical_incidence’ - ‘severe_incidence’ - ‘simulatedEIR’

Returns:
  • str

    The corresponding y-axis label for the channel, or None if the

  • channel is not recognized.

Source code in plotter\plot_helper.py
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
def get_ylab(channel):
    """
    Retrieve the y-axis label for a given channel.

    This function returns a formatted string representing the y-axis label
    based on the specified channel name. The labels correspond to specific
    epidemiological measures.

    Args:
        channel (str): The name of the channel for which to retrieve the label.
            Possible values include:
            - 'prevalence_2to10'
            - 'prevalence'
            - 'clinical_incidence'
            - 'severe_incidence'
            - 'simulatedEIR'

    Returns:
        str: The corresponding y-axis label for the channel, or None if the
        channel is not recognized.
    """

    channel_labels = {'prevalence_2to10': '$\it{Pf}$PR$_{2-10}$ (%)',
                      'prevalence': '$\it{Pf}$PR (%)',
                      'clinical_incidence': 'Clinical incidence (pppy)',
                      'severe_incidence': 'Severe incidence (pppy)',
                      'simulatedEIR': 'simulated EIR'
                      }

    return channel_labels.get(channel)

input_to_simulated_eir(fdir, df, sweepvar='cm_clinical', facet_var='seasonality')

Generate line plots comparing input EIR to simulated EIR.

This function creates line plots where the x-axis represents the input EIR values and the y-axis represents the simulated annual EIR. Different models are represented by different lines on the plot, and the plots are organized into panels based on specified facets.

Parameters:
  • fdir (str) –

    Directory where the generated plot will be saved.

  • df (DataFrame) –

    DataFrame that includes combined model results.

  • sweepvar (str, default: 'cm_clinical' ) –

    Variable to group the data and create multiple lines on the plot. Default is ‘cm_clinical’.

  • facet_var (str, default: 'seasonality' ) –

    Variable to create multiple panels on the plot. Default is ‘seasonality’.

Returns:
  • None

    The function saves the generated plots to disk.

Source code in plotter\plot_relationship.py
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
def input_to_simulated_eir(fdir, df, sweepvar='cm_clinical', facet_var='seasonality'):
    """
    Generate line plots comparing input EIR to simulated EIR.

    This function creates line plots where the x-axis represents the input EIR values
    and the y-axis represents the simulated annual EIR. Different models are represented
    by different lines on the plot, and the plots are organized into panels based on
    specified facets.

    Args:
        fdir (str): Directory where the generated plot will be saved.
        df (pd.DataFrame): DataFrame that includes combined model results.
        sweepvar (str, optional): Variable to group the data and create multiple lines
            on the plot. Default is 'cm_clinical'.
        facet_var (str, optional): Variable to create multiple panels on the plot.
            Default is 'seasonality'.

    Returns:
        None: The function saves the generated plots to disk.
    """

    xyvars = ['simulatedEIR', 'output_target']
    figure_vars = xyvars + [sweepvar, facet_var]
    df, caption_txt = subset_dataframe_for_plot(df, figure_vars)
    unique_facets = df[facet_var].unique()
    unique_groups = df[sweepvar].unique()
    unique_groups = np.sort(unique_groups)
    num_colors = len(unique_groups)
    color_palette = sns.color_palette('colorblind', max(num_colors, 4))
    firstPlot = True
    nx = 1
    ny = len(unique_facets)
    f = 1
    fig = plt.figure(figsize=(10 * nx, 6 * ny))
    for fi in unique_facets:
        fdf = df[df[facet_var] == fi]

        ax = fig.add_subplot(ny, nx, f)
        ax.set_title(fi)
        f += 1
        for i, (p, pdf) in enumerate(fdf.groupby([sweepvar])):
            # plot mean, min and max of seeds and age groups
            color_key = color_selector(i, p)
            if p == 'EMOD':
                model = 'em'
            elif p == 'OpenMalaria':
                model = 'om'
            elif p == 'malariasimulation':
                model = 'ms'
            else:
                print(f'Warning: Model not recognized: {p}')
            xmean, ymean = get_x_y(pdf, 'output_target', f'model_input_{model}', 'simulatedEIR')
            merge_df = pd.merge(left=xmean, right=ymean, on='output_target')
            merge_df.sort_values(by=f'model_input_{model}', inplace=True)
            ax.plot(merge_df[f'model_input_{model}'], merge_df['simulatedEIR'], '-', linewidth=0.8, label=f"{p}",
                    color=color_palette[color_key])
            ax.fill_between(merge_df[f'model_input_{model}'], merge_df['simulatedEIR_min'],
                            merge_df['simulatedEIR_max'], alpha=0.1, color=color_palette[color_key])
        if firstPlot:
            lg = ax.legend(loc='upper left', bbox_to_anchor=(0, 1))
            firstPlot = False
        ax.set_ylim(0.1, 10000)
        ax.set_xlim(0.1, 10000)
        ax.set_yscale('symlog')
        ax.set_xscale('symlog')

    plt.xlabel('Input EIR')
    plt.ylabel('Simulated annual EIR')

    fname = f'input_to_simulated_eir_by_{sweepvar}_{facet_var}'
    fname = fname.replace('modelname', 'model')
    fig.savefig(os.path.join(fdir, f'{fname}.png'), bbox_extra_artists=(lg,), bbox_inches='tight')
    plt.close()

parse_args()

Parses command-line arguments for simulation specifications.

This function uses the argparse library to handle command-line inputs required for running simulation experiments. It defines required and optional arguments, including the job directory and model names.

Returns:
  • argparse.Namespace: An object containing the parsed command-line arguments.

Command Line Arguments

-d/–directory (str): The job directory where the exp.obj file is located. This argument is required. -m/–modelname (str): One or more model names to compare. This argument is optional and defaults to [‘EMOD’, ‘OpenMalaria’, ‘malariasimulation’].

Source code in plotter\plot_helper.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def parse_args():
    """
    Parses command-line arguments for simulation specifications.

    This function uses the argparse library to handle command-line inputs
    required for running simulation experiments. It defines required and optional
    arguments, including the job directory and model names.

    Returns:
        argparse.Namespace: An object containing the parsed command-line arguments.

    Command Line Arguments:
        -d/--directory (str): The job directory where the exp.obj file is located. This argument is required.
        -m/--modelname (str): One or more model names to compare. This argument is optional
                              and defaults to ['EMOD', 'OpenMalaria', 'malariasimulation'].
    """

    description = "Simulation specifications"
    parser = argparse.ArgumentParser(description=description)

    parser.add_argument(
        "-d",
        "--directory",
        type=str,
        required=True,
        help="Job Directory where exp.obj is located",
    )
    parser.add_argument(
        "-m",
        "--modelname",
        nargs='+',
        type=str,
        required=False,
        help="Name of models to compare",
        default=['EMOD', 'OpenMalaria', 'malariasimulation']
    )

    return parser.parse_args()

prevalence2to10_to_incidence(fdir, df, sweepvar='modelname', facet_var='seasonality', channel='clinical_incidence', agegrps=None)

Generate line plots for PfPR2to10 and either clinical or severe incidence, grouped by the specified sweep variable and faceted by another variable.

This function creates a series of line plots where the x-axis represents the prevalence of PfPR2to10, and the y-axis represents either clinical or severe incidence. Each line corresponds to a model, and the plots can be faceted by a specified variable (e.g., seasonality).

Parameters:
  • fdir (str) –

    Directory where the generated plot will be saved.

  • df (DataFrame) –

    DataFrame containing the model results for plotting.

  • sweepvar (str, default: 'modelname' ) –

    Variable to group the data and create multiple panels on the plot. Default is ‘modelname’.

  • facet_var (str, default: 'seasonality' ) –

    Variable used to create subplots based on its unique values. Default is ‘seasonality’.

  • channel (str, default: 'clinical_incidence' ) –

    Variable representing the y-axis data to be plotted (e.g., ‘clinical_incidence’).

  • agegrps (list of str, default: None ) –

    Specific age groups to filter the DataFrame before plotting. Defaults to None, which means all age groups will be used.

Returns:
  • None

    The function saves the generated plots to disk.

Raises:
  • ValueError

    If an error occurs while filtering the DataFrame or during plotting.

Source code in plotter\plot_relationship.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def prevalence2to10_to_incidence(fdir, df, sweepvar='modelname', facet_var='seasonality', channel='clinical_incidence',
                                 agegrps=None):
    """
    Generate line plots for PfPR2to10 and either clinical or severe incidence,
    grouped by the specified sweep variable and faceted by another variable.

    This function creates a series of line plots where the x-axis represents
    the prevalence of PfPR2to10, and the y-axis represents either clinical or
    severe incidence. Each line corresponds to a model, and the plots can be
    faceted by a specified variable (e.g., seasonality).

    Args:
        fdir (str): Directory where the generated plot will be saved.
        df (pd.DataFrame): DataFrame containing the model results for plotting.
        sweepvar (str, optional): Variable to group the data and create multiple panels
            on the plot. Default is 'modelname'.
        facet_var (str, optional): Variable used to create subplots based on its unique
            values. Default is 'seasonality'.
        channel (str): Variable representing the y-axis data to be plotted (e.g.,
            'clinical_incidence').
        agegrps (list of str, optional): Specific age groups to filter the DataFrame
            before plotting. Defaults to None, which means all age groups will be used.

    Returns:
        None: The function saves the generated plots to disk.

    Raises:
        ValueError: If an error occurs while filtering the DataFrame or during plotting.
    """

    figure_vars = ['prevalence_2to10', channel] + [sweepvar, facet_var]
    df, caption_txt = subset_dataframe_for_plot(df, figure_vars, agegrps)

    unique_facets = df[facet_var].unique()
    nx = len(df['ageGroup'].unique())
    ny = 1
    f = 1
    firstPlot = True
    fig = plt.figure(figsize=(10 * nx, 10 * ny))

    unique_groups = df[sweepvar].unique()
    unique_groups = np.sort(unique_groups)
    num_colors = len(unique_groups)
    color_palette = sns.color_palette('colorblind', max(num_colors, 4))

    # for each age group, plot x = prevalence 2_to_10, y = incidences (or prevalence)
    for fi in unique_facets:
        fdf = df[df[facet_var] == fi]

        for agroup in fdf['ageGroup'].unique():
            ax = fig.add_subplot(ny, nx, f)
            ax.set_title(f'{fi}, {agroup}')
            f += 1
            adf = fdf[fdf.ageGroup == agroup]

            for mi, model in enumerate(unique_groups):
                pdf = adf[adf[sweepvar] == model]
                color_key = color_selector(mi, model)
                if not pd.isnull(pdf['prevalence_2to10']).all():
                    # plot mean, min and max of seeds
                    xmean, ymean = get_x_y(pdf, 'output_target', 'prevalence_2to10', channel)
                    merge_df = pd.merge(left=xmean, right=ymean, on='output_target')
                    if channel == 'simulatedEIR':
                        merge_df.sort_values(by='output_target', inplace=True)
                    else:
                        merge_df.sort_values(by='prevalence_2to10', inplace=True)
                    ax.plot(merge_df['prevalence_2to10'], merge_df[channel], label=model,
                            color=color_palette[color_key])
                    ax.fill_between(merge_df['prevalence_2to10'], merge_df[f'{channel}_min'],
                                    merge_df[f'{channel}_max'], alpha=0.1, color=color_palette[color_key])
            if channel == "prevalence":
                y_lim = 1
            else:
                y_lim = max(adf[channel]) * 1.1
            ax.set_ylim(0, y_lim)
            ax.set_xlim(0, 1)
            ax.set_xticks([0, 0.2, 0.4, 0.6, 0.8, 1])
            ax.title.set_text(f'{agroup}')
            ax.set_ylabel(get_ylab(channel))
            ax.set_xlabel('$\it{Pf}$PR$_{2-10}$ (%)')
            if firstPlot:
                lg = ax.legend(loc='upper left', bbox_to_anchor=(0.5, 1.2))
                firstPlot = False

    fname = f'pfpr2to10_to_{channel}'
    fig.savefig(os.path.join(fdir, f'{fname}.png'), bbox_extra_artists=(lg,), bbox_inches='tight')
    plt.close()

prevalence2to10_to_incidence_by_age_and_model(fdir, df, sweepvar='modelname', channel='clinical_incidence')

Generate line plots for PfPR2to10 and either clinical or severe incidence, separated by model (columns) and age groups (rows).

This function creates a grid of line plots where each plot corresponds to a specific model and age group for the prevalence of PfPR2to10 on the x-axis and either clinical or severe incidence on the y-axis.

Parameters:
  • fdir (str) –

    Directory where the generated plot will be saved.

  • df (DataFrame) –

    DataFrame containing the model results for plotting.

  • sweepvar (str, default: 'modelname' ) –

    Variable to group the data and create multiple panels on the plot. Default is ‘modelname’.

  • channel (str, default: 'clinical_incidence' ) –

    Variable representing the y-axis data to be plotted (e.g., ‘clinical_incidence’).

Returns:
  • None

    The function saves the generated plots to disk.

Source code in plotter\plot_relationship.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def prevalence2to10_to_incidence_by_age_and_model(fdir, df, sweepvar='modelname', channel='clinical_incidence'):
    """
    Generate line plots for PfPR2to10 and either clinical or severe incidence,
    separated by model (columns) and age groups (rows).

    This function creates a grid of line plots where each plot corresponds to a
    specific model and age group for the prevalence of PfPR2to10 on the x-axis
    and either clinical or severe incidence on the y-axis.

    Args:
        fdir (str): Directory where the generated plot will be saved.
        df (pd.DataFrame): DataFrame containing the model results for plotting.
        sweepvar (str, optional): Variable to group the data and create multiple
            panels on the plot. Default is 'modelname'.
        channel (str): Variable representing the y-axis data to be plotted (e.g.,
            'clinical_incidence').

    Returns:
        None: The function saves the generated plots to disk.
    """

    seasonality = list(df['seasonality'].unique())
    nx = 3
    ny = 3
    f = 1
    fig = plt.figure(figsize=(10 * nx, 10 * ny))

    unique_groups = df[sweepvar].unique()
    unique_groups = np.sort(unique_groups)
    num_colors = len(unique_groups)
    color_palette = sns.color_palette('colorblind', max(num_colors, 4))
    young_child = ['0-0.5', '0.5-1', '1-2', '2-5']
    older_child = ['5-10', '10-15']
    adult = ['15-20', '20-100']
    ages_list = [young_child, older_child, adult]
    mean_vars = [channel, 'prevalence_2to10']

    # for each age group, plot x = prevalence 2_to_10, y = incidences (or prevalence)
    for si, season in enumerate(seasonality):
        sdf = df[df.seasonality == season]
        for ages in ages_list:
            adf = sdf[sdf['ageGroup'].isin(ages)]
            for mi, model in enumerate(unique_groups):
                color_key = color_selector(mi, model)
                ax = fig.add_subplot(ny, nx, f)
                ax.set_title(f'{season} {model}')
                f += 1
                mdf = adf[adf.modelname == model]
                mdf = mdf.groupby(['seed', 'output_target'])[mean_vars].agg(np.mean).reset_index()
                if not pd.isnull(mdf['prevalence_2to10']).all():
                    # plot mean, min and max of seeds
                    xmean, ymean = get_x_y(mdf, 'output_target', 'prevalence_2to10', channel)
                    merge_df = pd.merge(left=xmean, right=ymean, on='output_target')
                    merge_df.sort_values(by='prevalence_2to10', inplace=True)
                    ax.plot(merge_df['prevalence_2to10'], merge_df[channel], label=model,
                            color=color_palette[color_key])
                    ax.fill_between(merge_df['prevalence_2to10'], merge_df[f'{channel}_min'],
                                    merge_df[f'{channel}_max'], alpha=0.1, color=color_palette[color_key])
                if channel == "prevalence":
                    y_lim = 1
                else:
                    y_lim = max(adf[channel]) * 1.01
                ax.set_ylim(0, y_lim)
                ax.set_xlim(0, 1)
                ax.set_xticks([0, 0.2, 0.4, 0.6, 0.8, 1])
                ax.title.set_text(f'{model}: {ages}')
                ax.set_ylabel(get_ylab(channel))
                ax.set_xlabel('$\it{Pf}$PR$_{2-10}$ (%)')
    fname = f'pfpr2to10_to_{channel}_by_age_by_model'
    fig.savefig(os.path.join(fdir, f'{fname}.png'), bbox_inches='tight')
    plt.close()

subset_dataframe_for_plot(df, figure_vars, agegrp=None, filter_target=True)

Filter the input DataFrame for plotting based on specified criteria.

This function filters the DataFrame according to the provided figure variables, optional age groups, and other selection criteria to prepare the data for visualization. It also returns a string summarizing the filtering applied.

Parameters:
  • df (DataFrame) –

    The input DataFrame containing simulation results.

  • figure_vars (list of str) –

    List of variables used for plotting, which influences the filtering process.

  • agegrp (str or list of str, default: None ) –

    Specific age group(s) to filter by. If provided, only the data for these age groups will be retained. Defaults to None, meaning no filtering by age group will occur.

  • filter_target (bool, default: True ) –

    If True, the function will filter the DataFrame to retain the maximum output target value if certain variables are not present in figure_vars. Defaults to True.

Returns:
  • tuple

    A tuple containing: - pd.DataFrame: The filtered DataFrame. - str: A summary string describing the filtering that was applied.

Raises:
  • ValueError

    If ‘modelname’ is not included in figure_vars and there

Source code in plotter\plot_helper.py
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
def subset_dataframe_for_plot(df, figure_vars, agegrp=None, filter_target=True):
    """
    Filter the input DataFrame for plotting based on specified criteria.

    This function filters the DataFrame according to the provided figure variables,
    optional age groups, and other selection criteria to prepare the data for
    visualization. It also returns a string summarizing the filtering applied.

    Args:
        df (pd.DataFrame): The input DataFrame containing simulation results.
        figure_vars (list of str): List of variables used for plotting, which influences
            the filtering process.
        agegrp (str or list of str, optional): Specific age group(s) to filter by.
            If provided, only the data for these age groups will be retained.
            Defaults to None, meaning no filtering by age group will occur.
        filter_target (bool, optional): If True, the function will filter the DataFrame
            to retain the maximum output target value if certain variables are not
            present in `figure_vars`. Defaults to True.

    Returns:
        tuple: A tuple containing:
            - pd.DataFrame: The filtered DataFrame.
            - str: A summary string describing the filtering that was applied.

    Raises:
        ValueError: If 'modelname' is not included in `figure_vars` and there
        are multiple unique model names in the DataFrame.
    """

    txt = 'Filtered dataset by: '

    if agegrp is not None:
        if 'ageGroup' not in figure_vars:
            if isinstance(agegrp, list):
                df = df[df['ageGroup'].isin(agegrp)]
                txt += f'ageGroup in {agegrp}, '
            else:
                df = df[df['ageGroup'] == agegrp]
                txt += f'ageGroup {agegrp}, '

    if 'cm_clinical' not in figure_vars:
        selected_cm = df['cm_clinical'].min()
        df = df[df['cm_clinical'] == selected_cm]
        txt += f'cm_clinical {selected_cm}, '

    if 'seasonality' not in figure_vars:
        selected_season = 'seasonal' if 'seasonal' in df['seasonality'].unique() else df['seasonality'].unique()[0]
        df = df[df['seasonality'] == selected_season]
        txt += f'seasonality {selected_season}, '

    if filter_target:
      if not any(var in figure_vars for var in ['simulatedEIR', 'prevalence_2to10', 'output_target']) :
          selected_output = df['output_target'].max()
          df = df[df['output_target'] == selected_output]
          txt += f'output_target {selected_output}, '

    if 'modelname' not in figure_vars and df['modelname'].nunique() > 1:
        raise ValueError('modelname needs to be specified in plot if results were combined for more than 1 model')

    # Remove trailing comma and space if any filtering has been done
    if txt.endswith(', '):
        txt = txt[:-2]

    return df, txt