Page 72 - MI-2-4
P. 72

Microbes & Immunity                                                        Bioinformatics analysis platform




            A








            B










            Figure 2. Multiple sequence alignment of S protein sequences from SARS-CoV-2 found in Taiwan. (A) Manual annotation of AA 437 to AA 512 of the
            multiple sequence alignment results shown in Figure 1I (after rearranging the strains in chronological order), indicating the difference in amino acid
            sequences of the six spike proteins at 10 amino acid positions. The numbers in red and the dashed boxes represent the amino acid positions described
            by Zhou et al.  for the different SARS-CoV-2 variants.  The discrepancy between the amino acid position numbers in the present alignment and those
                     20
                                               20
            by Zhou et al.  is due to an insertion of three amino acids EPE at AA 215 to AA 217 for the two omicron variant sublineage BA.1 strains. (B) Multiple
                     20
            sequence alignment of AA 213 to AA 292 shows the unique insertion of the three amino acids EPE for the two omicron variant sublineage BA.1 strain
            (red box).
            Abbreviations: AA: Amino acid; SARS-CoV-2: Severe acute respiratory syndrome coronavirus 2.
            their individual attributes, and the perpetual flux of life   number of sequences they can handle, the level of user-
            finds manifestation through bioinformation. The tools in   friendliness of the interface, etc.
            this platform were broadly grouped into five categories:   In the current exercise, Clustal Omega was chosen as
            sequence analysis, structural biology, metabolomics,   the multiple sequence alignment tool. To begin the multiple
            evolutionary genetics, and biomedical data science, with   sequence alignment  exercise, we  downloaded six  spike
            some of them developed by scientists from our university. 9-19    protein sequences from six SARS-CoV-2 genomes from
            Such a collection of tools would facilitate end-users to   the NCBI virus database (https://www.ncbi.nlm.nih.gov/
            identity suitable tools for analyzing their specific datasets.   labs/virus/vssi/#/), representing SARS-CoV-2 isolated from
            Further studies and reviews on the various in silico tools   different phases of the COVID-19 pandemic in Taiwan.
            are necessary to compare their advantages and limitations.
                                                                 From the front page of Mx BIOME, “Sequence Analysis”
              To illustrate the use of Mx. BIOME for microorganism
            study, we performed multiple sequence alignment for six   was clicked (Figure  1A), then “Sequence Alignment”
                                                               (Figure  1B), then “Clustal Omega” (Figure  1C), which
            spike protein sequences from six severe acute respiratory   brought us to the main page of Clustal Omega from the
            syndrome coronavirus 2 (SARS-CoV-2) strains isolated   EBI website (Figure 1D). The six spike protein sequences
            from different phases of the COVID-19 pandemic in   were pasted into the input box, and FASTA was chosen
            Taiwan. Multiple sequence alignment is one of the first steps   as the output format (Figure 1E). The alignment process
            in analyzing microbial DNA/RNA or protein sequences,   started when “Submit” was clicked (Figure  1F). A  few
            which involves the alignment of nucleotide or amino acid   seconds later, when the screen indicated that the alignment
            sequences to identify regions of identity and similarity. Such   was finished, “View Results” was clicked (Figure 1G), and
            regions  are important because they represent  functional   the input sequences were shown again (Figure 1H). When
            and  evolutionary  relationships  between  the  sequences.
            In multiple sequence alignment tools, the nucleotide or   “Alignments” was clicked, the multiple sequence alignment
            amino acid residues are represented as rows in a matrix,   of the six spike protein sequences was displayed (Figure 1I).
            with gaps inserted between the residues so as to generate   When “Results Files” was clicked, the page on which the
            the most optimal alignments with maximum identity and   results could be downloaded appeared (Figure 1J).
            similarity as determined by the algorithm used in the tool.   Further manual analysis confirmed that the three spike
            The commonly used multiple sequence alignment tools   proteins from SARS-CoV-2 strains isolated on December
            include T-Coffee, MUSCLE, Clustal Omega, and MAFFT.   26,  2020,  April  4,  2021,  and  July  24,  2021,  with  specific
            These tools differ by the algorithms used, maximum   mutation A570D, belonged to the alpha variant sublineage


            Volume 2 Issue 4 (2025)                         64                               doi: 10.36922/mi.5077
   67   68   69   70   71   72   73   74   75   76   77