Array ( ) Array ( ) Array ( [0] => 21050 ) Array ( [0] => 21050 )

Last Updated: 21/02/2023

Hybrid genome assembly and annotation of Plasmodium vivax from the Asia-Pacific region

Objectives

To combine Illumina short-reads and Oxford Nanopore long-read sequencing technologies to reconstruct one of the most comprehensive Asia-Pacific region-based P. vivax representative genomes.

Principal Institution

Deakin University, Australia

Principal Investigators / Focal Persons

Paolo Bareng

Rationale and Abstract

The worldwide increase in P. vivax (Pv) cases, especially in the Asia-Pacific region, necessitates the need for more comprehensive strategies and interventions to address this emerging challenge. However, the mechanisms underlying Pv’s biology, epidemiology, and pathogenesis still remains unclear; making this species even more difficult to control and eradicate. Whilst, genomic studies play a substantial role in shedding light on the complexity of Pv infection, several limitations of the published reference genomes impede our understanding of the characteristics of Pv parasites. The construction of Pv reference genomes Salvador-1 and PvP01 has aided the Pv research community, and unraveled many questions surrounding Pv’s biology. Despite this, short-read sequencing – the technology used to assemble the available reference genomes, has limitations particularly in repetitive regions, thus failing to assemble a complete parasite genome.

Combination with long-read sequencing technologies, such as Oxford Nanopore (ONT), has the potential to overcome this shortcoming. Adding further complexity is the highly diverse nature of Pv as a result of its complex life cycle and host-parasite interactions. Previous reports have demonstrated that Pv parasites are genetically varied between geographical locations. In this context, it is essential to assemble region-specific reference genomes to address genomic differences from various geographic areas.

Study Design

Aim 1. Using Illumina and ONT sequencing technologies, we aim to generate high quality genome assemblies from clinical isolates sourced from various population in the Asia-Pacific region. In this study, we will utilize archived DNA samples collected from Cambodia, Thailand, Indonesia, Papua New Guinea, and Solomon Islands. The team has access to the archived DNA samples from these populations. Selection criteria will include removing multi-clonal infection samples, and selecting only high Pv parasitaemia to obtain the most appropriate samples for high quality sequencing. We will employ a hybrid de novo assembly method which first uses ONT-reads to assemble draft genomes which is later polished by Illumina reads. Then, we will evaluate and compare the hybrid models in terms of their scaffold length, contig size, and read lengths to ensure high quality draft genomes. Lastly, we will measure contiguity using N50 metric to determine the readiness of the assembly for gene annotation. The team will conduct sample selection as well as short- and long-read sequencing runs at Deakin University.

Aim 2. We will utilise the assembled and corrected proposed hybrid model from Aim 1 to carry out structural and functional annotation of the protein coding sequences. First, we will predict the location and structure of the exons through PvP01 strain as a reference point. Through the use of reference-based approach, we will be able to accurately predict the genome framework of our model assembly, especially with the use of the PvP01 strain which is the most complete Pv reference genome to date. Next, we aim to specify biologically relevant information to the predicted genes through combination of sequence similarity search from public repositories and the use of neural network for predicting patterns from a provided gene set. This approach will further confirm the presence of known genes from the reference genomes as well as identify potential proteins not yet described. Dr Sarah Auburn’s team have assembled and annotated the PvP01 reference genome7 and thus bioinformatic tools and pipelines are readily available. In addition, the team consists of members 10 with strong bioinformatics and statistics background therefore a more streamlined aim 2 will be expected.

SHARE

Related Resources

No related items found

SHARE