Page 63 - MI-2-2
P. 63
Microbes & Immunity Phylogenetic analysis of HPV16 L1 in Asia
countries such as Pakistan and India due to logistical and “Major capsid protein L1.” For selection of host organisms,
financial constraints. The disparity in vaccination rates “Homo sapiens (human), taxid: 9606” was applied to the
5
underscores the need for enhanced public health efforts to Host bar. “From December 31, 2014, to November 28, 2023”
improve accessibility and awareness, particularly in lower- was specified for reaching the sequences obtained during
income regions. These particles trigger the immune systems the collection date. Nucleotide and protein sequences of
of host organisms by mimicking the actual virus capsid 25 results were downloaded with their year and accession
®
without the viral genome. The quadrivalent Gardasil , the names in FASTA format. The primary reason for selecting
19
first approved of these vaccines as 4vHPV, received FDA the Asian region was the availability of sufficient data in the
approval in 2006. This vaccine, formulated with VLPs of NCBI Virus database. The studied sequences of HPV-16
HPV-6, 11, 16, and 18 types, targets the most common (NCBI: txid333760) from human host organisms (NCBI:
cancer-causing and genital wart-causing types. 16,20,21 The txid9606) have been collected from the NCBI Virus public
second HPV vaccine approved by the FDA was the bivalent database from the Asian region, with selection criteria
(2vHPV) Cervarix® vaccine, which protects against HPV- referencing the year of the latest vaccine approval. 22,23
®
16 and 18. The latest vaccine, Gardasil 9, was approved by The sequences in the NCBI Virus database were analyzed
17
the FDA in 2014 and targets nine HPV types, including 6, based on continent and country. It was determined that the
11, 16, 18, 31, 33, 45, 52, and 58, expanding on the coverage most appropriate data for the R software code used in the
provided by the quadrivalent Gardasil .
® 18
analysis of the sequences were data sourced from Asia.
Given the significant role of L1 proteins in the Therefore, for the geographic region, “Asia” was selected.
development of HPV vaccines, this study aims to analyze
the L1 gene and protein sequences of HPV-16 in Asia, a 2.2. Data preprocessing
region where data availability from the National Center for First, an alignment was performed, and sequences that were
Biotechnology Information (NCBI) Virus database enables significantly shorter and exhibited lower similarity than
comprehensive analysis. The focus on the Asian region others were identified. As a result, three sequences with
also considers the diverse genetic background of HPV-16, accession codes BDO24687, BEU33870, and BEU33846
which may influence vaccine efficacy and regional disease were removed from the dataset due to their inability to
prevalence. By investigating these sequences, the study seeks be effectively analyzed or compared with the remaining
to identify phylogenetic relationships, sequence variations, sequences.
and conserved elements critical for viral function. Such
insights are essential to bridge gaps in understanding Attributes of the chosen nucleotide sequences were
HPV16 genetic variability and its implications for vaccine compiled, as shown in Table 1. This table contains the
design and global prevention strategies. accession numbers, collection dates of isolates, submitted
locations, name of the host organism, name of the isolate,
2. Materials and methods and length of the selected sequences. These pieces of
information were obtained from NCBI Virus.
This study employed a computational bioinformatics
approach, where publicly available nucleotide and protein Paralleling the approach utilized in Table 1, relevant
sequences of HPV16 L1 were retrieved from the NCBI information regarding each selected protein sequence was
Virus database. These sequences were processed using compiled. This information obtained from NCBI Virus
multiple alignment and phylogenetic analysis tools in the included accession number, collection date, location, host
R software environment to assess sequence variability and organism, isolate name, and sequence length, as presented
clustering patterns. Table 2.
2.1. Selecting and obtaining sequences 2.3. Processing data with R
On the NCBI Virus page, searching filters were determined; Data processing was carried out in the R software
24
first, for the virus search bar, “Human papillomavirus environment, and the code was used by Toparslan et al.
type 16, taxid: 333760” was entered. Sequence length is The Fasta sequences obtained from NCBI Virus were used
limited by nucleotide completeness criteria selected as in the analysis. The outputs of the R script were presented
“complete” to eliminate incomplete sequences. To reach in the form of figures.
all L1 protein-related entries, paying attention to case 3. Results
sensitivity, these strings were entered in the protein name
section: “L1,” “L1 Protein,” “major capsid protein L1,” “major The results section of this study presents the output images
capsid L1 protein,” “late protein L1,” “L1 capsid protein,” “L1 generated during the processing of selected sequences in
major capsid protein,” “late major capsid protein L1,” and the R software environment. Residues differing in amino
Volume 2 Issue 2 (2025) 55 doi: 10.36922/mi.8410

