Page 70 - MI-2-4

P. 70

Microbes & Immunity Bioinformatics analysis platform

the subsequent years, efficiency was further improved essentially requires the most advanced bioinformatics
following the capacity expansion of the sequencing tools that have effectively harnessed the exponential
machine from handling one sample to multiple samples expansion in computation power, as observed by Moore’s
at a time. Such remarkable improvements in Sanger Law. Thirty years ago, having some knowledge of computer
sequencing technologies led to the tremendous growth in programming and the expertise in inputting commands
DNA sequencing data across various fields. were often prerequisites for using the state-of-the-art
In the past two decades, DNA sequencing technology bioinformatics tools. However, bioinformatics tools have
has undergone a further breakthrough, transitioning gradually become more user-friendly in the last 20 years,
from the traditional Sanger sequencing to several high- a metamorphosis similar to the evolution of the IBM
throughput, short-read, second-generation sequencing word processor in the 1980s to Microsoft Word. In the
technologies. This significant development started with the present day, bioinformatics analysis can be easily executed
launch of the 454 pyrosequencing platform 20 years ago. even by scientists with minimal knowledge of computer
2
However, the Illumina platform emerged as the market programming. On the other hand, the proliferation of
™
leader, and the HiSeq Sequencing System has become bioinformatics tools could have bewildered students, post-
the most popular one. An approach different from Sanger doctoral fellows, and even some experienced scientists.
sequencing, the sequencing by synthesis technology from For many centuries, scientific discoveries have typically
™
the Illumina HiSeq platform requires DNA template been made following the conventional approach of The
amplification before sequencing. Fluorescent-labeled Scientific Method, which involves the step-by-step process
reversible terminator nucleotides are incorporated into from formulating a hypothesis to gathering relevant, high-
the elongating DNA strands and then imaged through quality data for hypothesis testing, analyzing the data
fluorophore excitation at the genomic composition bias. collected, and finally drawing conclusions. In recent times,
Another significant development happened in 2011 the immense advancements in computational power,
®
when Pacific Biosciences launched the first PacBio RS coupled with the accumulation of massive amounts of data
sequencing platform to the market. This sequencing generated over the years, have given rise to the development
platform did not require genomic DNA amplification, of big data analysis as a fast-growing discipline with wide
6,7
hence addressing one of the major challenges of second- applications across various industries and sectors. The
generation sequencing technologies. This platform huge amounts of structured and unstructured data used in
employed a real-time single molecule detection technology, big data analysis often consist of retrospective data that are
enabling real-time sequencing of individual polymerase pooled from multiple sources, involving hundreds or even
molecules with lesser bias and longer reads. However, thousands of individuals or parties with varying levels of
3,4
this technology was noted to be associated with a tendency expertise and training in data collection. Such datasets are
to sequence errors. With enhancements in its chemistry typically stored in databases which are made accessible to
and software, iterations of the subsequent sequencer have the public, or in cases where the datasets are owned by an
demonstrated substantial improvements in accuracy, organization, they can be readily retrieved by its members.
throughput, and read length compared to earlier models. Using advanced statistical tools to examine large, complex
Recently, the MinION sequencer (Oxford Nanopore datasets, big data analysis is capable of unveiling novel
Technologies) has made next-generation sequencing even patterns, associations, and trends. As a result, it offers fresh
more user-friendly. Featured with short turnaround time, conclusions and new insights, which conventional analysis
5
portable size, and low equipment cost, this device enables of smaller datasets generated by individual research groups
small laboratories to perform their own in-house next- would not have the capacity to achieve. However, it is
generation sequencing experiments. The advent of these necessary to be mindful of “garbage in, garbage out” data
robust next-generation sequencing technology platforms quality: To derive reliable conclusions, it is crucial that the
has resulted in the generation of DNA sequencing data on data used for analysis are of high quality. 8
a massive, industrial scale, an obvious example of which is In view of the unparalleled explosion of data on all
the exponential growth of the GenBank database. fronts of biology, we have recently developed a platform

Alongside the rapid generation of DNA and RNA named Mx. BIOME, which provides a collection of some of
sequences, there has been a huge expansion of other the most popular and cutting-edge tools suitable for use in
biological data, such as protein sequences and protein multiple disciplines of bioinformatics analysis and big data
structures. Across biomedical science fields, analysis of analysis (http://mxbiome.nchu.edu.tw). The designation
vast amounts of these biological data for deciphering their “Mx. BIOME” symbolizes the entirety of living cells,
meaning, testing hypotheses, and generating novel ideas each comprising informational components regardless of

Volume 2 Issue 4 (2025) 62 doi: 10.36922/mi.5077

65 66 67 68 69 70 71 72 73 74 75