Page 10 - GPD-1-2

P. 10

Gene & Protein in Disease DNA methylation and gene expression on rats with protein malnutrition

ordinate is the gene, and different colors indicate different was used for MPileUP processing according to the Hisat
gene expression levels. comparison results of each sample and the reference
genome, and the possible SNP and indel information of
2.9. RNA-seq reads mapping each sample were then annotated with Annovar.

We aligned the reads of sample A and sample B to the UCSC
(http://genome.ucsc.edu/) Homo sapiens reference genome 2.13. Statistical analysis
using HISAT package, which initially removed a portion The methylKit software was used to analyze the
of the reads based on quality information accompanying differentially methylated regions (DMRs) between groups.
each read and then mapped the reads to the reference A 1000 bp Windows and 500 bp overlap were selected by
genome. HISAT allows multiple alignments PE read (up default. P <0.01 was the difference screening threshold.
to 20 by default) and a maximum of two mismatches GoMiner database was used to analyze the enrichment
when mapping the reads to the reference. HISAT builds a of Gene Ontology (GO) and Kyoto Encyclopedia of
database of potential splice junctions and confirms these Genes and Genomes (KEGG) for the DMR-related
by comparing the previously unmapped reads against the genes obtained from the difference comparison of each
database of putative junctions. group. The number of DMR-related genes included in
each GO (or KEGG entry) was counted, and the P-value
2.10. Transcript abundance estimation and of enrichment significance of DMR-related genes in
differentially expressed testing each GO (or KEGG pathway entry) was calculated by
The mapped read of each sample was assembled using hypergeometric distribution test. t-test was used to screen
StringTie. Then, all transcriptomes from samples were the different methylation sites between groups after data
merged to reconstruct a comprehensive transcriptome processing.
using perl scripts. After the ﬁnal transcriptome was
generated, StringTie and edgeR were used to estimate 3. Results
the expression levels of all transcripts. StringTie was used
to perform expression level for mRNAs by calculating 3.1. The quality of raw sequencing data and
differentially expressed analysis
FPKM. The differentially expressed mRNAs and genes
were selected with log2 (fold change) >1 or log2 (fold All the raw sequence data were eligible for further analysis,
change) <−1 and with statistical significance (P < 0.01) by and the results of quality control are shown in Table 1.
R package. The results of mapping to genome through Hisat2 had a
higher concordant rate (Table 2). Regional distribution of
2.11. Genome-wide DNA methylation assay reference genome alignment is shown in Figure 1. Valid
Total DNA was extracted using QIAamp Fast DNA Tissue data that can be compared to the reference genome can
Kit (Qiagen, Dusseldorf, Germany). The bisulfate sequence be subjected to the comparisons of exon, intron, and
libraries were constructed using the Acegen Bisulfite-Seq intergeneric regions based on the region information
Library Prep Kit (AceGen, Cat. No. AG0311), according of the reference genome. Under normal circumstances,
to the manufacturer’s protocol. Briefly, the genomic DNA the percentage content of sequence localization in exon
spiked with methylated Lambda DNA was fragmented region should be the highest, while reads in intron and
by sonication (for whole-genome bisulfite sequencing) intergeneric region are compared, which may be caused
or using MspI (NEB, USA, for reduced representation by the shearing event of pre-mRNA, incomplete genome
bisulfite sequencing) to a mean size of approximately annotation, DNA pollution and background noise, etc.
200–500 bp, then end-repaired, 5’-phosphorylated,
3’-dA-tailed, and ligated to 5-methylcytosine-modified 3.2. Analysis of total gene expression level
adapters. After bisulfate treatment, the DNA was amplified The distribution statistics of expression values in the
with 10 cycles of polymerase chain reaction (PCR). The Table 3 can be further expressed by the sample FPKM box
constructed libraries were then analyzed by Agilent 2100 diagram (Figure 2), so as to understand the gene expression
Bioanalyzer and finally sequenced by Illumina platforms level from the overall level. For samples of biological
using a 2×150 bp paired-end sequence protocol. duplication, the reproducibility of design samples can
also be preliminarily judged by the box diagram. The
2.12. Single-nucleotide polymorphism and indel x-coordinate is the sample name, the y-coordinate is log10
analysis (FPKM), and the box chart for each region corresponds
We analyzed single-nucleotide polymorphism (SNP) sites to five statistics (maximum, upper quartile, median, lower
in coding region at transcriptomic level. Samtools software quartile, and minimum from top to bottom).

Volume 1 Issue 2 (2022) 4 https://doi.org/10.36922/gpd.v1i2.169

5 6 7 8 9 10 11 12 13 14 15