Last Updated: 20/02/2023
A statistical framework for identifying sufficient data to infer a total parasite biomass
Objectives
To develop a theoretical framework to assess the statistical informationcontent of data collected under different designs, and identify the optimal designs for LDH-based estimation of hidden biomass.
Peter Doherty Institute for Infection and Immunity, Australia
Recent studies have shown P. vivax parasites accumulate in tissue outside of the circulating blood, such as the bone marrow and spleen. This suggests that the peripheral blood parasitemia alone does not provide a complete picture of the total parasite biomass within an infected individual — a key indicator of infection severity. There has been some success in using parasite-producing proteins, such as HRP2, measured alongside parasitemia to estimate the total parasite biomass. Motivated by earlier work identifying a biomarker for P. falciparum, Plasmodium lactate dehydrogenase (pLDH) has been proposed to be a more attractive marker to facilitate the estimation of hidden biomass given its expression in all malaria species, including P. vivax . Our preliminary work extended previous within-host models of blood-stage parasite dynamics ), coupled with the production of PvLDH (i.e., LDH produced by P. vivax parasites), and showed the possibility of estimating the total biomass based on parasitemia and PvLDH when the data were collected twice a day over the entire course of infection. However, model parameter estimation is highly dependent on the information in the data, which is influenced by many experimental factors (collectively called the design), such as the number of patients, sampling frequency and length of follow-up, blood sampling matrix (e.g., plasma PvLDH or whole blood PvLDH), and assay platform characteristics.
Theoretical guidance to facilitate the best inference of the model parameters for determining the total biomass would facilitate study design. We propose to develop a theoretical framework to assess the statistical informationcontent of data collected under different designs, and identify the optimal designs for LDH-based estimation of hidden biomass. This will ensure that future patient studies are designed to ensure results are as informative as possible. In order to subsequently estimate the fraction of unobserved parasite biomass, we require sufficient and appropriately timed data collection on parasitemia and LDH to be able to distinguish model parameters and the fraction of unobserved parasites simultaneously.
We will use Bayesian optimal design methodology to explore different designs on the performance of inferring the unobserved parasite fraction. Design parameters to be explored will include the number of participants, the number of samples per participant, the timing of sample collection in the course of parasitemia for sample collection for LDH measurements (relative to infection and/or treatment, where applicable and known), and the assay performance characteristics. Parameter estimates from the literature, along with existing data that have been provided on parasitemia and LDH from 4 infected patients from Sabah, will inform prior distributions across various model parameters. Finding Bayesian optimal designs for complex models, such as the parasite-LDH model, has historically been infeasible due to the excessive computational requirements. We will explore state-of-the-art Bayesian optimal design methods that are suitable for complex models, such as those proposed by Overstall et al. (2020). Briefly, Overstall and McGree propose to use simulations from the model to approximate the model likelihood for use in subsequent calculation of the utility of a design (i.e., the amount of information contained in that design for estimating model parameters). This is an attractive approach for the parasite-LDH model, where simulation is relatively cheap compared to evaluating the model likelihood, and many simulations 10 can be generated relatively quickly via parallel computing. Subsequently, we will use a simulation-estimation framework to explore what additional information can be incorporated into the model from other data sources to improve the precision of our inferences.
Jun 2022 — Oct 2023
$15,000