Page 70 - MI-2-4
P. 70

Microbes & Immunity                                                        Bioinformatics analysis platform



            the subsequent years, efficiency was further improved   essentially requires the most advanced bioinformatics
            following the capacity expansion of the sequencing   tools that have effectively harnessed the exponential
            machine  from  handling  one  sample  to  multiple  samples   expansion in computation power, as observed by Moore’s
            at a time. Such remarkable improvements in Sanger   Law. Thirty years ago, having some knowledge of computer
            sequencing technologies led to the tremendous growth in   programming and  the expertise in  inputting  commands
            DNA sequencing data across various fields.         were often prerequisites for using the  state-of-the-art
              In the past two decades, DNA sequencing technology   bioinformatics tools. However, bioinformatics tools have
            has  undergone  a  further  breakthrough,  transitioning   gradually become more user-friendly in the last 20 years,
            from the traditional Sanger sequencing to several high-  a metamorphosis similar to the evolution of the  IBM
            throughput,  short-read,  second-generation  sequencing   word processor in the 1980s to  Microsoft Word. In the
            technologies. This significant development started with the   present day, bioinformatics analysis can be easily executed
            launch of the 454 pyrosequencing platform 20 years ago.    even by scientists with minimal knowledge of computer
                                                          2
            However, the Illumina platform emerged as the market   programming. On the other hand, the proliferation of
                              ™
            leader, and the HiSeq  Sequencing System has become   bioinformatics tools could have bewildered students, post-
            the most popular one. An approach different from Sanger   doctoral fellows, and even some experienced scientists.
            sequencing, the sequencing by synthesis technology from   For many centuries, scientific discoveries have typically
                            ™
            the  Illumina  HiSeq   platform  requires  DNA  template   been  made  following  the  conventional  approach  of  The
            amplification before sequencing. Fluorescent-labeled   Scientific Method, which involves the step-by-step process
            reversible terminator nucleotides are incorporated into   from formulating a hypothesis to gathering relevant, high-
            the elongating DNA strands and then imaged through   quality data for hypothesis testing, analyzing the data
            fluorophore excitation at the genomic composition bias.  collected, and finally drawing conclusions. In recent times,
              Another significant development happened in 2011   the immense advancements in computational power,
                                                      ®
            when Pacific Biosciences launched the first PacBio  RS   coupled with the accumulation of massive amounts of data
            sequencing platform to the market. This sequencing   generated over the years, have given rise to the development
            platform did not require genomic DNA amplification,   of big data analysis as a fast-growing discipline with wide
                                                                                                        6,7
            hence addressing one of the major challenges of second-  applications across various industries and sectors.  The
            generation sequencing technologies. This platform   huge amounts of structured and unstructured data used in
            employed a real-time single molecule detection technology,   big data analysis often consist of retrospective data that are
            enabling  real-time  sequencing  of  individual  polymerase   pooled from multiple sources, involving hundreds or even
            molecules  with  lesser  bias  and  longer  reads.   However,   thousands of individuals or parties with varying levels of
                                                 3,4
            this technology was noted to be associated with a tendency   expertise and training in data collection. Such datasets are
            to sequence errors. With enhancements in its chemistry   typically stored in databases which are made accessible to
            and software, iterations of the subsequent sequencer have   the public, or in cases where the datasets are owned by an
            demonstrated substantial improvements in accuracy,   organization, they can be readily retrieved by its members.
            throughput, and read length compared to earlier models.   Using advanced statistical tools to examine large, complex
            Recently, the MinION sequencer (Oxford Nanopore    datasets, big data analysis is capable of unveiling novel
            Technologies) has made next-generation sequencing even   patterns, associations, and trends. As a result, it offers fresh
            more user-friendly.  Featured with short turnaround time,   conclusions and new insights, which conventional analysis
                           5
            portable size, and low equipment cost, this device enables   of smaller datasets generated by individual research groups
            small laboratories to perform their own in-house next-  would not have the capacity to achieve. However, it is
            generation sequencing experiments. The advent of these   necessary to be mindful of “garbage in, garbage out” data
            robust next-generation sequencing technology platforms   quality: To derive reliable conclusions, it is crucial that the
            has resulted in the generation of DNA sequencing data on   data used for analysis are of high quality. 8
            a massive, industrial scale, an obvious example of which is   In  view  of  the  unparalleled  explosion  of  data  on  all
            the exponential growth of the GenBank database.    fronts of biology, we have recently developed a platform

              Alongside the rapid generation of DNA and RNA    named Mx. BIOME, which provides a collection of some of
            sequences, there has been a huge expansion of other   the most popular and cutting-edge tools suitable for use in
            biological data, such as protein sequences and protein   multiple disciplines of bioinformatics analysis and big data
            structures. Across biomedical science fields, analysis of   analysis (http://mxbiome.nchu.edu.tw). The designation
            vast amounts of these biological data for deciphering their   “Mx.  BIOME”  symbolizes  the  entirety  of  living  cells,
            meaning, testing hypotheses, and generating novel ideas   each comprising informational components regardless of


            Volume 2 Issue 4 (2025)                         62                               doi: 10.36922/mi.5077
   65   66   67   68   69   70   71   72   73   74   75