Page 72 - MI-2-4
P. 72
Microbes & Immunity Bioinformatics analysis platform
A
B
Figure 2. Multiple sequence alignment of S protein sequences from SARS-CoV-2 found in Taiwan. (A) Manual annotation of AA 437 to AA 512 of the
multiple sequence alignment results shown in Figure 1I (after rearranging the strains in chronological order), indicating the difference in amino acid
sequences of the six spike proteins at 10 amino acid positions. The numbers in red and the dashed boxes represent the amino acid positions described
by Zhou et al. for the different SARS-CoV-2 variants. The discrepancy between the amino acid position numbers in the present alignment and those
20
20
by Zhou et al. is due to an insertion of three amino acids EPE at AA 215 to AA 217 for the two omicron variant sublineage BA.1 strains. (B) Multiple
20
sequence alignment of AA 213 to AA 292 shows the unique insertion of the three amino acids EPE for the two omicron variant sublineage BA.1 strain
(red box).
Abbreviations: AA: Amino acid; SARS-CoV-2: Severe acute respiratory syndrome coronavirus 2.
their individual attributes, and the perpetual flux of life number of sequences they can handle, the level of user-
finds manifestation through bioinformation. The tools in friendliness of the interface, etc.
this platform were broadly grouped into five categories: In the current exercise, Clustal Omega was chosen as
sequence analysis, structural biology, metabolomics, the multiple sequence alignment tool. To begin the multiple
evolutionary genetics, and biomedical data science, with sequence alignment exercise, we downloaded six spike
some of them developed by scientists from our university. 9-19 protein sequences from six SARS-CoV-2 genomes from
Such a collection of tools would facilitate end-users to the NCBI virus database (https://www.ncbi.nlm.nih.gov/
identity suitable tools for analyzing their specific datasets. labs/virus/vssi/#/), representing SARS-CoV-2 isolated from
Further studies and reviews on the various in silico tools different phases of the COVID-19 pandemic in Taiwan.
are necessary to compare their advantages and limitations.
From the front page of Mx BIOME, “Sequence Analysis”
To illustrate the use of Mx. BIOME for microorganism
study, we performed multiple sequence alignment for six was clicked (Figure 1A), then “Sequence Alignment”
(Figure 1B), then “Clustal Omega” (Figure 1C), which
spike protein sequences from six severe acute respiratory brought us to the main page of Clustal Omega from the
syndrome coronavirus 2 (SARS-CoV-2) strains isolated EBI website (Figure 1D). The six spike protein sequences
from different phases of the COVID-19 pandemic in were pasted into the input box, and FASTA was chosen
Taiwan. Multiple sequence alignment is one of the first steps as the output format (Figure 1E). The alignment process
in analyzing microbial DNA/RNA or protein sequences, started when “Submit” was clicked (Figure 1F). A few
which involves the alignment of nucleotide or amino acid seconds later, when the screen indicated that the alignment
sequences to identify regions of identity and similarity. Such was finished, “View Results” was clicked (Figure 1G), and
regions are important because they represent functional the input sequences were shown again (Figure 1H). When
and evolutionary relationships between the sequences.
In multiple sequence alignment tools, the nucleotide or “Alignments” was clicked, the multiple sequence alignment
amino acid residues are represented as rows in a matrix, of the six spike protein sequences was displayed (Figure 1I).
with gaps inserted between the residues so as to generate When “Results Files” was clicked, the page on which the
the most optimal alignments with maximum identity and results could be downloaded appeared (Figure 1J).
similarity as determined by the algorithm used in the tool. Further manual analysis confirmed that the three spike
The commonly used multiple sequence alignment tools proteins from SARS-CoV-2 strains isolated on December
include T-Coffee, MUSCLE, Clustal Omega, and MAFFT. 26, 2020, April 4, 2021, and July 24, 2021, with specific
These tools differ by the algorithms used, maximum mutation A570D, belonged to the alpha variant sublineage
Volume 2 Issue 4 (2025) 64 doi: 10.36922/mi.5077

