Introduction
Amino acids are the building blocks of proteins, and many proteins are metalloproteins, meaning they contain metal ions coordinated by amino acid residues. These metalloproteins are essential for various biological processes, including oxygen transport (haemoglobin), electron transfer (cytochromes), and catalysis (enzymes). They provide ligands that stabilize metal ions in coordination complexes. The coordination of metal ions by amino acid residues can influence the geometry, electronic structure, and reactivity of the metal centre, thereby affecting the function of the overall complex. Amino acids contribute to the structural integrity of metalloproteins by coordinating with metal ions and forming stable complexes. The coordination bonds between amino acid residues and metal ions help maintain the overall protein structure, ensuring proper folding and stability. By understanding the coordination chemistry of amino acids in metalloproteins, chemists can design synthetic ligands with similar properties for applications in catalysis, sensing, and materials science. The human body contains twenty amino acids, with nine essential and five nonessential. Essential amino acids include histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. Nonessential amino acids include alanine, asparagine, aspartic acid, glutamic acid, and serine. Conditional amino acids include arginine, cysteine, glutamine, glycine, proline, and tyrosine. There are two non-standard amino acids, hydroxyproline and hydroxylysine. Hydroxyproline is derived from proline through post-translational modification, which is essential for collagen formation and the strength and stability of connective tissues in the human body, while hydroxylysine is crucial for protein building. Amino acids are the building blocks of proteins, and they play a crucial role in various biological processes. Each amino acid has a specific structure, including a central carbon atom (the alpha carbon), an amino group, a carboxyl group, and a side chain. To learntable more about amino acids in detail, explore1,2,3,4.
Topological indices are numerical descriptors that capture structural information about molecules or compounds. In the context of amino acids, researchers often use distance-based and degree-distance-based Topological indices to quantify their structural features. These indices can include the Wiener index, the Randi\(\acute{c}\) index, the Balaban index, and many others. Distance-Based Topological Indices are calculated based on the pairwise distances between atoms or nodes in the molecular graph of the amino acid. They provide information about the spatial arrangement of atoms within the molecule. Degree-Distance-Based topological Indices taking into account both the degrees (number of bonds) of atoms and the distances between them in the molecular graph. They provide a more comprehensive description of molecular structure. QSPR models are used to establish mathematical relationships between the structural properties of compounds (such as amino acids) and their physical, chemical, or biological properties or activities. These models aim to predict or explain specific properties based on the molecular structure. For more detail on QSAR/QSPR analysis, see5,6,7,8,9,10,11,12,13,14.
A connected graph G(V,E) with vertex set V(G) and edge set E(G) has a connection between any pair of vertices. The shortest distance between two vertices is denoted as d(u,v), while the degree of a vertex is the number of adjacent vertices, denoted by deg(v).
The following are some of the topological indices we have to use in our work:
Wiener index
The Wiener index15 is utilized in various fields, including chemical graph theory, network analysis, and predictive modeling, to predict the branching of molecules, network efficiency, and robustness of networks, thereby improving design and understanding. The Wiener index can provide information about the molecular size and branching. It is defined as half of the sum of the shortest distances between all pairs of vertices (atoms) in a molecular graph. i.e.
$$\begin{aligned} W(G)= & \frac{1}{2}\sum _{\{u,v\}\subseteq V(G)} d(u,v). \end{aligned}$$
Hyper-Wiener index
The hyper-Wiener index16, a more advanced measure of distance between vertices in a graph, can enhance predictive models in cheminformatics, provide insights into graph theory, evaluate network efficiency and resilience, and aid in drug design by identifying compounds with optimal characteristics. The hyper-Wiener index is an extension of the Wiener index. In chemical applications, the hyper-Wiener index captures information about the branching and connectivity of atoms within a molecule. It is calculated as the sum of the squares of the distances between all pairs of vertices in a graph. i.e.
$$\begin{aligned} HW(G)= & \frac{1}{2}\sum _{\{u,v\}\subseteq V(G)} (d(u,v)+d^{2}(u,v)). \end{aligned}$$
Distance-degree index
The Distance-Degree Index17 is another topological index used to describe molecular structure. The Distance Degree Index is used in various fields, including chemistry, drug design, structural analysis, network optimization, comparative analysis, and predictive modeling. It provides insights into network efficiency, node placement, and connectivity, aiding in predicting chemical properties and biological activities. It is based on both the degree (number of bonds) of atoms and their pairwise distances in a molecular graph. This index combines information about the connectivity of atoms and their spatial arrangement. i.e.
$$\begin{aligned} DD(G)= & \sum _{\{u,v\}\subseteq V(G)} (deg(u)+deg(v))d(u,v). \end{aligned}$$
Gutman index
The Gutman index18 is a topological index used to describe the structural characteristics of molecules in various fields, including chemistry, graph analysis, network theory, structural insights, predictive modeling, and drug design. It helps predict chemical properties and biological activities by capturing the complexity of molecular structures. In graph analysis, it evaluates network efficiency by considering node degrees and distances. In drug design, it helps identify molecules with specific properties due to its ability to reflect structural complexity and connectivity. This index provides information about the structural complexity and branching in a molecule. It is defined as
$$\begin{aligned} Gut(G)= & \sum _{\{u,v\}\subseteq V(G)} (deg(u)\times deg(v))d(u,v). \end{aligned}$$
Harary index
The Harary index19 is a molecular descriptor used in chemical graph theory to characterize the structure of chemical compounds and predict properties. It measures the resistance to information flow in networks, evaluating their connectivity and efficiency. In network theory, it evaluates network robustness and efficiency. Comparative analysis allows for identifying similarities and differences between networks, optimizing design. The Harary index also improves predictive modeling accuracy in fields like cheminformatics and network design. It contributes to graph theory research by exploring relationships between invariants. Harary index is calculated by summing the reciprocals of the distances between all pairs of vertices in a graph. It provides information about the “closeness” or connectivity of atoms within a molecule.
$$\begin{aligned} H(G)= & \sum _{\{u,v\}\subseteq V(G)} \frac{1}{d(u,v)}. \end{aligned}$$
Additively weighted Harary index
The additively weighted Harary index20 is an extension of the Harary index. The additively weighted Harary index offers enhanced molecular descriptors in cheminformatics, customizable analysis, improved network analysis, refined structural insights, predictive modeling, and application flexibility. Its weights reflect different types of interactions, allowing for better predictions of chemical and biological properties. The additively weighted Harary index also provides detailed insights into network efficiency and robustness, capturing the complexity of structures more accurately. Its application flexibility allows researchers to adapt the index to suit specific requirements and contexts, making it versatile and applicable across various fields. It involves summing the products of the distances between all pairs of vertices and their respective vertex weights.
$$\begin{aligned} H_{A}(G)= & \sum _{\{u,v\}\subseteq V(G)} \frac{deg(u)+deg(v)}{d(u,v)}. \end{aligned}$$
Multiplicative weighted Harary index
Similar to the additively weighted Harary index, the multiplicative weighted Harary index21 also takes into account the distances between pairs of vertices and their corresponding weights. However, in this case, the weights are multiplied instead of added. This index allows for a different perspective on the influence of vertex properties on molecular structure. The Multiplicative Weighted Harary Index is a valuable tool in various fields, including cheminformatics, network analysis, structural insights, predictive modeling, and complex systems analysis. It provides detailed molecular descriptors, advanced network analysis, and enhanced structural insights. Its weights account for varying interaction strengths, enhancing predictions and optimization in fields like drug design, materials science, and network engineering.
$$\begin{aligned} H_{M}(G)= & \sum _{\{u,v\}\subseteq V(G)} \frac{deg(u)\times deg(v)}{d(u,v)}. \end{aligned}$$
For more details on the importance of topological descriptors and its applications see22,23,24,25,26,27,28,29,30.
The paper is structured as follows: In “Methodology” we have explained the methodology and the tool used for the calculations. In “Molecular structures and computations of topological indices for different amino acids”, the topological indices depending on the degree and distance of the 22 amino acid are computed. In “Regression models”, three regression models are used to estimate five physical/chemical properties of 22 amino acid by considering the values of topological indices as independent variables and physical properties as dependent variables. The results obtained by these regression models are discussed in detail in “Discussion”. Finally, the conclusion section summarizes the results and some future work is suggested.
Methodology
To examine molecular structures, mathematical chemists employ topological indices that are based on distance. These topological indices determine the distances between atoms in a molecule and describes the molecule’s characteristics. Graph theoretical techniques and vertex division techniques are used to compute these topological indices. A computational technique called Quantitative Structure-Property Relationship (QSPR) is used to examine the connection between the molecules physico-chemical properties and chemical structures of amino acids. Chem-Spider provides data on physicochemical characteristics. SPSS statistical software was used to determine linear, quadratic and logarithmic regression equations for seven distance-based topological indices and five physical properties of twenty-two amino acids. These computational tools are fast and precise, making them effective for analyzing large data sets. Distance-based topological indices offer a useful tool for examining molecule compositions and characteristics, with potential applications in the pharmaceutical and medical industries. This method makes it easier to understand the structure-activity correlations that explain amino acid relationships. Figure 1 explains the flowchart of the work.
.
Molecular structures and computations of topological indices for different amino acids
In this section, we calculate seven topological indices based on degree of vertices and distance between each pair of vertices of twenty-two amino acid molecules. The twenty-two amino acid molecules that we consider are: Glycine, Leucine, Tyrosine, Serine, Glutamic Acid, Glutamine, Aspartic Acid, Asparagine, Phenylalanine, Alanine, Lysine, Arginine, Histidine, Cysteine, Valine, Proline, Hydroxyproline, Tryptophan, Isoleucine, Methionine, Threonine and Hydroxylysine. The molecular structures of these twenty-two amino acid are depicted in Figs. 2 and 3.
Chemical structures of amino acids (I).
Chemical structures of amino acids (II).
To compute the values of the seven considered topological indices, we need to calculate the distance between all pair of vertices and the edge partition based on the degree of end vertices of each edge for these molecular structures. For the sake of understanding, we give the details of the calculation of these topological indices for the molecular structure of Glycine. Let G denote the molecular structure of Glycine. We use the notation d(n,G) to denote the pair of vertices \(u,v\in V(G)\) such that \(d(u,v)=n\). It is easy to observe that Glycine has five vertices and four edges. The edge partition of Glycine based on the degree of end vertices of each edge and their frequency is depicted in Table 1. Now using the edge partition and the definition, the values of the topological indices are calculated as follows:
$$\begin{aligned} \bullet {\textbf {W(Glycine)}}= & \sum _{\{u,v\}\subseteq V(G)} d(u,v) \\= & 4(1)+4(2)+2(3)=18. \end{aligned}$$
$$\begin{aligned} \bullet {\textbf { HW(Glycine)}}= & \frac{1}{2}\sum _{\{u,v\}\subseteq V(G)} (d(u,v)+d^{2}(u,v))\\= & \frac{1}{2}(18)+\frac{1}{2}[4(1)^{2}+4(2)^{2}+2(3)^{2}]=28. \end{aligned}$$
$$\begin{aligned} \bullet {\textbf { DD(Glycine)}}= & \sum _{\{u,v\}\subseteq V(G)} (deg(u)+deg(v))d(u,v)\\= & 1(1)(1+2)+1(2)(1+3)+1(1)(2+3)+2(1)(1+1)+2(2)(1+2)\\ & \displaystyle +2(1)(1+3)+3(2)(1+1)=52. \end{aligned}$$
$$\begin{aligned} \bullet {\textbf { Gut(Glycine)}}= & \sum _{\{u,v\}\subseteq V(G)} (deg(u)\times deg(v))d(u,v)\\= & 1(1)(1\times 2)+1(2)(1\times 3)+1(1)(2\times 3)+2(1)(1\times 1)+2(2)(1\times 2)\\ & \displaystyle +2(1)(1\times 3)+3(2)(1\times 1)=36. \end{aligned}$$
$$\begin{aligned} \bullet {\textbf { H(Glycine)}}= & \sum _{\{u,v\}\subseteq V(G)} \frac{1}{d(u,v)}\\= & \frac{4}{1}+\frac{4}{2}+\frac{2}{3}=6.6667. \end{aligned}$$
$$\begin{aligned} \bullet {\textbf { H}}_{A} {\textbf {(Glycine)}}= & \sum _{\{u,v\}\subseteq V(G)} \frac{deg(u)+deg(v)}{d(u,v)}\\= & \frac{(1+2)(1)}{1}+\frac{(1+3)(2)}{1}+\frac{(2+3)(1)}{1}+\frac{(1+1)(1)}{2}\\ & \displaystyle +\frac{(1+2)(2)}{2} +\frac{(1+3)(2)}{2}+\frac{(1+1)(2)}{3}=23.3333. \end{aligned}$$
$$\begin{aligned} \bullet {\textbf { H}}_{M} {\textbf {(Glycine)}}= & \sum _{\{u,v\}\subseteq V(G)} \frac{deg(u)\times deg(v)}{d(u,v)}\\= & \frac{(1\times 2)(1)}{1}+\frac{(1\times 3)(2)}{1}+\frac{(2\times 3)(1)}{1}+\frac{(1\times 1)(1)}{2}\\ & \displaystyle +\frac{(1\times 2)(2)}{2} +\frac{(1\times 3)(2)}{2}+\frac{(1\times 1)(2)}{3}=18.6667. \end{aligned}$$
The values of the topological indices of other twenty-one amino acids can be calculated in a similar way. The calculated values of these topological indices are depicted in Table 2.
Regression models
In this section, we develop linear, quadratic and logarithmic regression models to estimate five physical properties of twenty two amino acid. Let \(\mathbb {P}\) denote any of the five physical property of amino acid and \(\mathbb{T}\mathbb{I}\) be the topological index value. Linear regression is used to predict the value of one variable based on another, with the dependent variable \((\mathbb {P})\) and the independent variable \((\mathbb{T}\mathbb{I})\) being the variables. It can be transformed into a quadratic model, which requires a larger set of data points, and a logarithmic regression model, which linearizes one or more variables through a log transformation. This technique helps establish a more accurate and meaningful relationship between variables. The significance level between all topological indices \(\mathbb{T}\mathbb{I}\) and the physico-chemical characteristics of amino acids is tested using these three models. The equations of these models are defined as follows:
$$\begin{aligned} \mathbb {P}= & a + b (\mathbb{T}\mathbb{I}), \end{aligned}$$
$$\begin{aligned} \mathbb {P}= & a + b (\mathbb{T}\mathbb{I}) + c (\mathbb{T}\mathbb{I})^{2}, \end{aligned}$$
$$\begin{aligned} \mathbb {P}= & a + b ln(\mathbb{T}\mathbb{I}). \end{aligned}$$
where \(\mathbb {P}\) is the property of the molecular structure, a is constant, b and c are the regression coefficients and \(\mathbb{T}\mathbb{I}\) is the topological index. The five physico-chemical properties: boiling point (BP) in \(^{\circ }\)C at 760 mmHg, molecular weight (MW) in g/mol, enthalpy of vaporization (EV) in kJ/mol, molar volume (MV) in cm\(^{3}\), and surface tension in cm\(^{3}\) of 22 amino acids are examined using the aforementioned seven newly defined TIs. The values of five physical properties of these amino acids are taken from ChemSpider and shown in Table 3. The results of the linear, quadratic and logarithmic regression models for each degree-based topological index are given below.
Wiener index W(G)
$$\begin{aligned}&{\textbf {Linear Models:}} & {\textbf {Quadratic Models:}}\\ BP&=266.210+0.457[W(G)]&BP&=255.970+0.796[W(G)]-0.001[W(G)]^{2}\\ MW&=97.696+0.323[W(G)]&MW&=82.928+0.570[W(G)]-0.001[W(G)]^{2}\\ EV&=56.554+0.053[W(G)]&EV&=52.679+0.118[W(G)]-0.0002[W(G)]^{2}\\ MV&=82.019+0.205[W(G)]&MV&=71.251+0.385[W(G)]-0.001[W(G)]^{2}\\ ST&=52.040+0.057[W(G)]&ST&=49.135+0.106[W(G)]-0.0001[W(G)]^{2}\\&{\textbf {Logarithmic Models:}}\\ BP&=59.186+57.528ln[W(G)]\\ MW&=-44.001+39.653ln[W(G)]\\ EV&=31.806+6.838ln[W(G)]\\ MV&=-11.760+26.002ln[W(G)]\\ ST&=28.405+6.710ln[W(G)]\\ \end{aligned}$$
Hyper-Wiener index HW(G)
$$\begin{aligned}&{\textbf {Linear Models:}} & {\textbf {Quadratic Models:}}\\ BP&=279.243+0.144[HW(G)]&BP&=259.993+0.284[HW(G)]-0.0001[HW(G)]^{2}\\ MW&=105.286+0.107[HW(G)]&MW&=93.740+0.191[HW(G)]-0.0000[HW(G)]^{2}\\ EV&=57.843+0.017[HW(G)]&EV&=54.733+0.040[HW(G)]-0.0000[HW(G)]^{2}\\ MV&=87.058+0.067[HW(G)]&MV&=78.462+0.130[HW(G)]-0.0000[HW(G)]^{2}\\ ST&=53.410+0.019[HW(G)]&ST&=50.948+0.037[HW(G)]-0.0000[HW(G)]^{2}\\&{\textbf {Logarithmic Models:}}\\ BP&=78.605+45.567ln[HW(G)]\\ MW&=-33.303+31.909ln[HW(G)]\\ EV&=33.533+5.524ln[HW(G)]\\ MV&=-4.267+20.835ln[HW(G)]\\ ST&=30.068+5.427ln[HW(G)]\\ \end{aligned}$$
Degree distance index DD(G)
$$\begin{aligned}&{\textbf {Linear Models:}} & {\textbf {Quadratic Models:}}\\ BP&=272.204+0.116[DD(G)]&BP&=256.414+0.189[DD(G)]-0.0000[DD(G)]^{2}\\ MW&=103.239+0.079[DD(G)]&MW&=88.005+0.149[DD(G)]-0.0000[DD(G)]^{2}\\ EV&=57.549+0.013[DD(G)]&EV&=53.998+0.029[DD(G)]-0.0000[DD(G)]^{2}\\ MV&=85.412+0.050[DD(G)]&MV&=75.296+0.097[DD(G)]-0.0000[DD(G)]^{2}\\ ST&=53.089+0.014[DD(G)]&ST&=50.248+0.027[DD(G)]-0.0000[DD(G)]^{2}\\&{\textbf {Logarithmic Models:}}\\ BP&=16.117+52.961ln[DD(G)]\\ MW&=-70.862+36.016ln[DD(G)]\\ EV&=27.357+6.179ln[DD(G)]\\ MV&=-29.608+23.658ln[DD(G)]\\ ST&=24.522+5.980ln[DD(G)]\\ \end{aligned}$$
Gutman distance index Gut(G)
$$\begin{aligned}&{\textbf {Linear Models:}} & {\textbf {Quadratic Models:}}\\ BP&=278.147+0.117[Gut(G)]&BP&=263.474+0.192[Gut(G)]-0.0000[Gut(G)]^{2}\\ MW&=108.518+0.076[Gut(G)]&MW&=93.595+0.153[Gut(G)]-0.0000[Gut(G)]^{2}\\ EV&=58.442+0.012[Gut(G)]&EV&=55.262+0.029[Gut(G)]-0.0000[Gut(G)]^{2}\\ MV&=88.708+0.049[Gut(G)]&MV&=79.215+0.097[Gut(G)]-0.0000[Gut(G)]^{2}\\ ST&=54.009+0.013[Gut(G)]&ST&=51.239+0.028[Gut(G)]-0.0000[Gut(G)]^{2}\\&{\textbf {Logarithmic Models:}}\\ BP&=52.323+48.348ln[Gut(G)]\\ MW&=-42.762+32.258ln[Gut(G)]\\ EV&=32.162+5.537ln[Gut(G)]\\ MV&=-11.132+21.186ln[Gut(G)]\\ ST&=29.331+5.331ln[Gut(G)]\\ \end{aligned}$$
Harary index H(G)
$$\begin{aligned}&{\textbf {Linear Models:}} & {\textbf {Quadratic Models:}}\\ BP&=216.737+5.359[H(G)]&BP&=188.932+8.073[H(G)]-0.057[H(G)]^{2}\\ MW&=68.826+3.480[H(G)]&MW&=39.902+6.303[H(G)]-0.059[H(G)]^{2}\\ EV&=51.717+0.577[H(G)]&EV&=44.329+1.298[H(G)]-0.015[H(G)]^{2}\\ MV&=62.789+2.254[H(G)]&MV&=40.642+4.415[H(G)]-0.045[H(G)]^{2}\\ ST&=47.086+0.610[H(G)]&ST&=43.862+0.924[H(G)]-0.007[H(G)]^{2}\\&{\textbf {Logarithmic Models:}}\\ BP&=16.017+105.593ln[H(G)]\\ MW&=-64.414+69.569ln[H(G)]\\ EV&=28.715+11.849ln[H(G)]\\ MV&=-26.227+45.991ln[H(G)]\\ ST&=25.630+11.539ln[H(G)]\\ \end{aligned}$$
Additively weighted Harary index H\(_{A}\)(G)
$$\begin{aligned}&{\textbf {Linear Models:}} & {\textbf {Quadratic Models:}}\\ BP&=226.653+1.215[H_{A}(G)]&BP&=201.041+1.826[H_{A}(G)]-0.003[H_{A}(G)]^{2}\\ MW&=78.050+0.754[H_{A}(G)]&MW&=48.604+1.456[H_{A}(G)]-0.003[H_{A}(G)]^{2}\\ EV&=53.312+0.124[H_{A}(G)]&EV&=46.622+0.284[H_{A}(G)]-0.001[H_{A}(G)]^{2}\\ MV&=68.715+0.489[H_{A}(G)]&MV&=47.201+1.002[H_{A}(G)]-0.003[H_{A}(G)]^{2}\\ ST&=48.645+0.133[H_{A}(G)]&ST&=45.616+0.205[H_{A}(G)]-0.0000[H_{A}(G)]^{2}\\&{\textbf {Logarithmic Models:}}\\ BP&=-90.765+96.761ln[H_{A}(G)]\\ MW&=-127.506+62.051ln[H_{A}(G)]\\ EV&=18.306+10.490ln[H_{A}(G)]\\ MV&=-68.505+41.155ln[H_{A}(G)]\\ ST&=15.845+10.133ln[H_{A}(G)]\\ \end{aligned}$$
Multiplicative weighted Harary index H\(_{M}\)(G)
$$\begin{aligned}&{\textbf {Linear Models:}} & {\textbf {Quadratic Models:}}\\ BP&=242.587+1.050[H_{M}(G)]&BP&=212.918+1.754[H_{M}(G)]-0.003[H_{M}(G)]^{2}\\ MW&=89.521+0.631[H_{M}(G)]&MW&=61.260+1.302[H_{M}(G)]-0.003[H_{M}(G)]^{2}\\ EV&=55.308+0.103[H_{M}(G)]&EV&=49.420+0.242[H_{M}(G)]-0.001[H_{M}(G)]^{2}\\ MV&=49.926+0.412[H_{M}(G)]&MV&=55.990+0.885[H_{M}(G)]-0.002[H_{M}(G)]^{2}\\ ST&=50.867+0.108[H_{M}(G)]&ST&=47.543+0.187[H_{M}(G)]-0.0000[H_{M}(G)]^{2}\\&{\textbf {Logarithmic Models:}}\\ BP&=-33.881+84.736ln[H_{M}(G)]\\ MW&=-86.178+53.188ln[H_{M}(G)]\\ EV&=25.547+8.931ln[H_{M}(G)]\\ MV&=-42.222+35.544ln[H_{M}(G)]\\ ST&=23.807+8.398ln[H_{M}(G)]\\ \end{aligned}$$
Discussion
This section computes the statistical parameters for linear, quadratic and logarithmic models to predict five physical/chemical properties of twenty-two different amino acids by considering topological indices as independent variables and physical/chemical properties as dependant variables. The sample size is represented by the symbol “N”, a is constant and the coefficient of independent variables are b and c. The correlation coefficient (R) indicates the strength and direction of a link between two variables. A perfect negative correlation is \(-1\), while a perfect positive correlation is \(+1\). A positive link exists if the coefficient is greater than 0, suggesting a direct relationship between the variables. A negative link exists if the correlation is less than 0, suggesting an inverse relationship. A correlation coefficient of exactly 0 indicates no relationship between the variables being analyzed. A value close to 0 indicates a weak relationship between variables.
The correlation coefficient (R), R- squared (\(R^2\)), F-statistic and p-value are computed in each case. Regression models have been shown to best predict two properties: molecular weight and molar volume (note that in each case, R is larger than 0.73). The aforementioned seven indices in the linear, quadratic and logarithmic regression models are the most accurate predictors of the molecular weight as compare to molar volume. Molecular weight gives the best correlation (R is greater than or equal to 0.88) in each case as you can see in Tables 4, 5, 6, 7, 8, 9, 10.
To validate the regression models, we consider five amino acids: glycine, tyrosine, asparagine, histidine, and methionine. The following regression models are used to calculate the values of the physical properties of amino acids:
-
Boiling point (using the linear regression model for the H index)
-
Molecular weight (using the logarithmic regression model for the HW index)
-
Enthalpy of vaporization (using the quadratic regression model for the H\(_{A}\) index)
-
Molar volume (using the linear regression model for the H\(_{m}\) index)
-
Water solubility (using the linear regression model for the Gutman index)
We compare the experimental data for these physical properties with the results from the regression models. The regression models are used to compute the five physical properties for these amino acids, which are then compared to the experimental values (see Table 11). The regression models accurately predict three physical properties: boiling point, molecular weight, and molar volume.
Conclusion
In this study, we have investigated the structural properties of amino acids by employing topological indices and regression models to predict their physical and chemical properties. By analyzing twenty-two amino acids, we developed linear, quadratic, and logarithmic regression models to estimate five key properties: boiling point, molecular weight, enthalpy of vaporization, molar volume, and water solubility. Our results demonstrate the effectiveness of these regression models in accurately predicting the boiling point, molecular weight, and molar volume for a subset of amino acids, specifically glycine, tyrosine, asparagine, histidine, and methionine. These findings underscore the potential of topological indices as valuable tools for capturing the unique structural features of amino acids in their molecular graphs. This research advances our understanding of the structural determinants of amino acid properties, providing novel insights that have practical implications in various fields, including bioinformatics, drug design, and structural biology. By enhancing our ability to predict and manipulate amino acid properties, this work contributes to the broader goal of understanding and leveraging the molecular intricacies of biological systems.
Data Availability
All data generated or analysed during this study are included in this published article.
References
Li, P., Yin, Y. L., Li, D., Kim, S. W. & Wu, G. Amino acids and immune function. Br. J. Nutr. 98(2), 237–252 (2007).
Wilson, R. P. Amino acids and proteins. In Fish Nutrition 143–179. (Academic Press, 2003).
Barrett, G. (Ed.). Chemistry and Biochemistry of the Amino Acids. (Springer Science and Business Media, 2012).
Lieu, E. L., Nguyen, T., Rhyne, S. & Kim, J. Amino acids in cancer. Exp. Mol. Med. 52(1), 15–30 (2020).
Katritzky, A. R. & Gordeeva, E. V. Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research. J. Chem. Inf. Comput. Sci. 33(6), 835–857 (1993).
Dearden, J. C. The use of topological indices in QSAR and QSPR modeling. Advances in QSAR Modeling: Applications in Pharmaceutical, Chemical, Food, Agricultural and Environmental Sciences, 57–88 (2017).
Hayat, S., Alanazi, S. J. F. & Liu, J.-B. Two novel temperature-based topological indices with strong potential to predict physicochemical properties of polycyclic aromatic hydrocarbons with applications to silicon carbide nanotubes. Phys. Scr. 99, 055027 (2024).
Article ADS CAS Google Scholar
Sakander, H., Hilalina, M., Alanazi, S. J. F. & Wang, S. Predictive potential of eigenvalues-based graphical indices for determining thermodynamic properties of polycyclic aromatic hydrocarbons with applications to polyacenes. Comput. Mater. Sci. 238, 112944 (2024).
Sakander, H., Liu, J.-B. Comparative analysis of temperature-based graphical indices for correlating the total \(\pi\)-electron energy of benzenoid hydrocarbons. Int. J. Mod. Phys. B.[SPACE]https://doi.org/10.1142/S021797922550047X.
Kirmani, S. A. K., Ali, P. & Azam, F. Topological indices and QSPR/QSAR analysis of some antiviral drugs being investigated for the treatment of COVID-19 patients. Int. J. Quantum Chem. 121(9), e26594 (2021).
Hayat, S., Khan, A., Ali, K. & Liu, J.-B. Structure-property modeling for thermodynamic properties of benzenoid hydrocarbons by temperature-based topological indices. Ain Shams Eng. J. 15(3), 102586 (2024).
Lu\(\breve{c}\)i\(\acute{c}\), B., Trinajsti\(\acute{c}\), N. New developments in QSPR/QSAR modeling based on topological indices. SAR QSAR Environ. Res.7(1–4), 45–62 (1997).
Junkes, B. D. S., Arruda, A. C. S., Yunes, R. A., Porto, L. C. & Heinzen, V. E. F. Semi-empirical topological index: a tool for QSPR/QSAR studies. J. Mol. Model. 11, 128–134 (2005).
Khadikar, P. V., Karmarkar, S. & Agrawal, V. K. A novel PI index and its applications to QSPR/QSAR studies. J. Chem. Inf. Comput. Sci. 41(4), 934–949 (2001).
Wiener, H. Structural determination of paraffin boiling points. J. Am. Chem. Soc. 69(1), 17–20 (1947).
Klein, D. J., Lukovits, I. & Gutman, I. On the definition of the hyper-Wiener index for cycle-containing structures. J. Chem. Inf. Comput. Sci. 35(1), 50–52 (1995).
Dobrynin, A. A. & Kochetova, A. A. Degree distance of a graph: A degree analog of the Wiener index. J. Chem. Inf. Comput. Sci. 34(5), 1082–1086 (1994).
Feng, L. & Liu, W. The maximal Gutman index of bicyclic graphs. MATCH Commun. Math. Comput. Chem 66(2), 699–708 (2011).
Plav\(\check{s}\)ic, D., Nikoli\(\acute{c}\), S., Trinajsti\(\acute{c}\), N., Mihali\(\acute{c}\), Z. On the Harary index for the characterization of chemical graphs. J. Math. Chem.12, 235–250 (1993).
Alizadeh, Y., Iranmanesh, A. & Do\(\check{s}\)li\(\acute{c}\), T. Additively weighted Harary index of some composite graphs. Discrete Math.313(1), 26–34 (2013).
An, M. & Xiong, L. Multiplicatively weighted Harary index of some composite graphs. Filomat 29(4), 795–805 (2015).
Liu, J. B. et al. Zagreb indices and multiplicative Zagreb indices of Eulerian graphs. Bull. Malays. Math. Sci. Soc. 42, 67–78 (2019).
Liu, J.B., Zhang, X., Cao, J., & Chen, L. Mean first-passage time and robustness of complex cellular mobile communication network. In IEEE Transactions on Network Science and Engineering, vol. 11, no. 3, pp. 3066–3076. https://doi.org/10.1109/TNSE.2024.3358369 (2024).
Zhang, G., Mushtaq, A., Aslam, A., Parveen, S. & Kanwal, S. Studying some networks using topological descriptors and multi-criterion decision making. Mol. Phys. 121, 16. https://doi.org/10.1080/00268976.2023.2222345 (2023).
Hui, Z., Yousaf, S., Aslam, A., Binyamin, M. A. & Kanwal, S. On expected values of some degree based topological descriptors of random Phenylene chains. Mol. Phys. 121, 16. https://doi.org/10.1080/00268976.2023.2225648 (2023).
Hui, Z., Rauf, A., Naeem, M., Aslam, A., Saleem, A. V. Quality testing analysis of Ve-degree based entropies by using benzene derivatives. 123(17), e27146 (2023).
Yang, Y., Liu, H., Wang, H. & Fu, H. Subtrees of spiro and polyphenyl hexagonal chains. Appl. Math. Comput. 268, 547–560 (2023).
Yang, Yu., Sun, X., Wang, J. C. H. & Zhang, X. The expected subtree number index in random polyphenylene and spiro chains. Discret. Appl. Math. 285, 483–492 (2020).
Yang, Yu. Hongbo Liu, Hua Wang, Scott Makeig, Enumeration of BC-subtrees of trees. Theoret. Comput. Sci. 580, 59–74 (2015).
Yang, Yu. et al. Enumeration of subtrees and BC-subtrees with maximum degree no more than k in trees. Theoret. Comput. Sci. 892, 258–278 (2021).
Acknowledgements
This work was supported by the Key Scientific and Technological Project of Henan Province, China(grant No.242102521023) andResearchers Supporting Project number (RSP2025R401), King Saud University, Riyadh, Saudi Arabia.
Author information
Authors and Affiliations
School of Software, Pingdingshan University, Pingdingshan, 467000, Henan, China
Huili Li
International Joint Laboratory for Multidimensional Topology and Carcinogenic Characteristics Analysis of Atmospheric Particulate Matter PM2.5, Pingdingshan, 467000, Henan, China
Huili Li
Department of Mathematics, Faculty of Science, University of Gujrat, Gujrat, Pakistan
Anisa Naeem&Shamaila Yousaf
Department of Natural Sciences and Humanities, University of Engineering and Technology, Lahore (RCET), Lahore, Pakistan
Adnan Aslam
Mathematics Department, College of Science, King Saud University, P.O. Box 22452, 11495, Riyadh, Saudi Arabia
Fairouz Tchier
Department of Mathematics, College of Natural and Computational Sciences, Wollega University, Nekemte, Ethiopia
Keneni Abera Tola
Authors
- Huili Li
View author publications
You can also search for this author in PubMedGoogle Scholar
- Anisa Naeem
View author publications
You can also search for this author in PubMedGoogle Scholar
- Shamaila Yousaf
View author publications
You can also search for this author in PubMedGoogle Scholar
- Adnan Aslam
View author publications
You can also search for this author in PubMedGoogle Scholar
- Fairouz Tchier
View author publications
You can also search for this author in PubMedGoogle Scholar
- Keneni Abera Tola
View author publications
You can also search for this author in PubMedGoogle Scholar
Contributions
All authors contributed equally to the paper.
Corresponding author
Correspondence to Keneni Abera Tola.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, H., Naeem, A., Yousaf, S. et al. Topological analysis and predictive modeling of amino acid structures with implications for bioinformatics and structural biology. Sci Rep 15, 638 (2025). https://doi.org/10.1038/s41598-024-83697-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-83697-6
Keywords
- Chemical graph theory
- Topological indices
- Amino acid
- QSPR models