Page 115 - ITPS-7-2
P. 115
INNOSC Theranostics and
Pharmacological Sciences PI3K-α inhibitors for cancer immunotherapy
considering the inherent complexity and imperfections in the human PI3K-α protein, encoded as 6PYS, is a protein
data preparation operations. 37,38 It serves as a basis for valid complex composed of a ligand and several water molecules.
data analysis. Preprocessing includes various techniques The structure of 6PYS, obtained through X-ray diffraction,
37
such as cleaning, integration, transformation, imputation exhibits a resolution of 2.19Å, with associated R-values of
of missing values, and reduction. 33,38 free, work, and observed, numerically presented as 0.259,
In this study, lists of PI3K-α inhibitory molecules 0.2243, and 0.225, respectively. The composition of 6PYS
were obtained from the binding databank, resulting in includes a total structural weight of 110.61 kDa, an atom
a dataset comprising 3994 inhibitory molecules in 3D count of 7558, modeled residue counts of 890, deposited
geometry. Furthermore, the dataset included columns residue counts of 945, and one unique protein chain A.
containing IC values of the molecules, molecule IDs, Furthermore, no mutations were associated with the 6PYS
50
ligand names, ROMol object information of ligands, etc. polymer sequence that was engineered from the reference
The IC values column contained affinity information, sequence.
50
indicating the potency of each molecule against the PI3K-α The protein preparation involved isolating the ligand
target. The dataset was in SDF (structure-data file) format, from the 6PYS protein-ligand complex, followed by
and data preprocessing was performed using the Python protein content modification using the protein preparation
programming language. and refinement wizard embedded in Schrödinger Maestro
The IC values of compounds, expressed in nanomolar (Schrödinger Release 2020-3: Maestro, Schrödinger,
50
(nM) units and ranging from 0.07 – 7200 nM, helped capture LLC, United States, 2023). The Maestro software is an
a broader chemical space, enhancing the identification of intuitive molecular modeling environment for various
novel ligands. In addition, the IC column was also used scientific discoveries based on material science, as well
50
as a reference column, in which duplicate rows sharing the as an integrated predictive computational modeling and
same IC were dropped. The governing code syntax was machine-learning platform for small-molecule drug
50
specific to maintaining the first entries, as it was assumed development. During refinement, simulation settings were
that two or more ligands with the same IC value exhibited configured for a pH of 7.0, which allowed small molecules
50
similar potency or affinity, pharmacological effects, and (HETs) to detect ligands, metals, and ions. In addition,
functional activities toward the target protein or receptor. the refinement process incorporated various measures,
Dropping duplicate entries of IC values offered a normal such as the assignment of bond orders the employment
50
distribution of values that made the dataset more amenable of a chemical component dictionary (CCD) database to
to statistical analysis. However, docking ligands of similar help identify and characterize the ligand present in the
half-maximal inhibitory concentrations may not provide protein structure in connection with its binding modes
significant additional insight. Later, the IC values were and their potential functional or therapeutic roles;
50
converted to pIC values to enable dataset standardization inclusion of missing hydrogens in the protein; the addition
50
and consistency. of terminal oxygens to the protein; the conversion of
selenomethionines to methionines; the filling of missing
The IC values depicted in multiple units can complicate
50
the analysis of results across different concentrations. loops; cap termini; the deletion of water molecules beyond
HETs of 0 Å; and the generation of HET state within 7.0 ±
Hence, it was necessary to convert the IC values in the 2.0 pH value. The Kabat antibody annotation scheme was
50
dataset to pIC values. Data presentation in pIC values,
50
50
which represent the values as the negative logarithm of employed to facilitate the design and analysis of antibody-
the molar concentration of the IC values, is considered based therapeutics by comparing the protein sequences
50
a better approach. This method enhances data clarity, and structures of antibodies. Furthermore, to mimic the
natural environment of the protein and prevent unwanted
minimizes potential errors in data representation, and
improves reproducibility with standardization, linearity, interactions or structural distortions that may arise from
normal distribution, and precision as additional attributes. exposed termini, the termini of the protein were capped
Relevant columns were selected and preserved for further with small fragments of peptides.
analysis. The data preprocessing stage functions as a Hydrogen bond assignment was carried out in the
preliminary filtering technique to minimize the compound refinement stage to assign hydrogen bonds to the right
selection size before executing virtual screening campaigns. geometry. The optimization of the hydrogen bond
assignment scheme was carried out using PROPKA, a
2.3. Protein complex refinement molecular dynamics program in Maestro that facilitated
The PI3K-α protein structure was obtained from the a quantitative analysis of the protein pKa values of
RCSB Protein Data Bank (rcsb.org). The architecture of ionizable groups. More specifically, PROPKA was utilized
Volume 7 Issue 2 (2024) 5 doi: 10.36922/itps.2340

