Page 63 - MI-2-2
P. 63

Microbes & Immunity                                                Phylogenetic analysis of HPV16 L1 in Asia



            countries such as Pakistan and India due to logistical and   “Major capsid protein L1.” For selection of host organisms,
            financial constraints.  The disparity in vaccination rates   “Homo sapiens (human), taxid: 9606” was applied to the
                             5
            underscores the need for enhanced public health efforts to   Host bar. “From December 31, 2014, to November 28, 2023”
            improve accessibility and awareness, particularly in lower-  was specified for reaching the sequences obtained during
            income regions. These particles trigger the immune systems   the collection date. Nucleotide and protein sequences of
            of host organisms by mimicking the actual virus capsid   25 results were downloaded with their year and accession
                                                      ®
            without the viral genome.  The quadrivalent Gardasil , the   names in FASTA format. The primary reason for selecting
                                19
            first approved of these vaccines as 4vHPV, received FDA   the Asian region was the availability of sufficient data in the
            approval in 2006. This vaccine, formulated with VLPs of   NCBI  Virus  database.  The  studied  sequences  of  HPV-16
            HPV-6, 11, 16, and 18 types, targets the most common   (NCBI: txid333760) from human host organisms (NCBI:
            cancer-causing  and  genital  wart-causing  types. 16,20,21   The   txid9606) have been collected from the NCBI Virus public
            second HPV vaccine approved by the FDA was the bivalent   database from the Asian region, with selection criteria
            (2vHPV) Cervarix® vaccine, which protects against HPV-  referencing the year of the latest vaccine approval. 22,23
                                           ®
            16 and 18.  The latest vaccine, Gardasil 9, was approved by   The sequences in the NCBI Virus database were analyzed
                    17
            the FDA in 2014 and targets nine HPV types, including 6,   based on continent and country. It was determined that the
            11, 16, 18, 31, 33, 45, 52, and 58, expanding on the coverage   most appropriate data for the R software code used in the
            provided by the quadrivalent Gardasil .
                                          ® 18
                                                               analysis of the sequences were data sourced from Asia.
              Given the significant role of L1 proteins in the   Therefore, for the geographic region, “Asia” was selected.
            development of HPV vaccines, this study aims to analyze
            the L1 gene and protein sequences of HPV-16 in Asia, a   2.2. Data preprocessing
            region where data availability from the National Center for   First, an alignment was performed, and sequences that were
            Biotechnology Information (NCBI) Virus database enables   significantly shorter and exhibited lower similarity than
            comprehensive analysis. The focus on the Asian region   others were identified. As a result, three sequences with
            also considers the diverse genetic background of HPV-16,   accession codes BDO24687, BEU33870, and BEU33846
            which may influence vaccine efficacy and regional disease   were removed from the  dataset due  to their inability to
            prevalence. By investigating these sequences, the study seeks   be effectively analyzed or compared with the remaining
            to identify phylogenetic relationships, sequence variations,   sequences.
            and conserved elements critical for viral function. Such
            insights are essential to bridge gaps in understanding   Attributes of the chosen nucleotide sequences were
            HPV16 genetic variability and its implications for vaccine   compiled, as shown in  Table  1. This table contains the
            design and global prevention strategies.           accession numbers, collection dates of isolates, submitted
                                                               locations, name of the host organism, name of the isolate,
            2. Materials and methods                           and length of  the  selected sequences.  These pieces  of
                                                               information were obtained from NCBI Virus.
            This study employed a computational bioinformatics
            approach, where publicly available nucleotide and protein   Paralleling the approach utilized in  Table  1, relevant
            sequences of HPV16 L1 were retrieved from the NCBI   information regarding each selected protein sequence was
            Virus  database.  These  sequences  were  processed  using   compiled. This information obtained from NCBI Virus
            multiple alignment and phylogenetic analysis tools in the   included accession number, collection date, location, host
            R software environment to assess sequence variability and   organism, isolate name, and sequence length, as presented
            clustering patterns.                               Table 2.

            2.1. Selecting and obtaining sequences             2.3. Processing data with R
            On the NCBI Virus page, searching filters were determined;   Data processing was carried out in the R software
                                                                                                            24
            first, for the virus search bar, “Human papillomavirus   environment, and the code was used by Toparslan et al.
            type  16, taxid: 333760” was entered. Sequence length is   The Fasta sequences obtained from NCBI Virus were used
            limited by nucleotide completeness criteria selected as   in the analysis. The outputs of the R script were presented
            “complete” to eliminate incomplete sequences. To reach   in the form of figures.
            all L1 protein-related entries, paying attention to case   3. Results
            sensitivity, these strings were entered in the protein name
            section: “L1,” “L1 Protein,” “major capsid protein L1,” “major   The results section of this study presents the output images
            capsid L1 protein,” “late protein L1,” “L1 capsid protein,” “L1   generated during the processing of selected sequences in
            major capsid protein,” “late major capsid protein L1,” and   the R software environment. Residues differing in amino


            Volume 2 Issue 2 (2025)                         55                               doi: 10.36922/mi.8410
   58   59   60   61   62   63   64   65   66   67   68