Prediction Acidity Constant of Various Benzoic Acids ...

January 12, 2018 | Author: Anonymous | Category: Documents
Share Embed


Short Description

An artificial neural network (ANN) is successfully presented for prediction acidity constant (pKa) of various benzoic ac...

Description

Prediction Acidity Constant Using QSPR Models

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

2007

Prediction Acidity Constant of Various Benzoic Acids and Phenols in Water Using Linear and Nonlinear QSPR Models Aziz Habibi-Yangjeh,* Mohammad Danandeh-Jenagharad, and Mahdi Nooshyar Department of Chemistry, Faculty of Science, University of Mohaghegh Ardebili, P.O. Box 179, Ardebil, Iran * E-mail: [email protected]; [email protected] Received April 18, 2005

An artificial neural network (ANN) is successfully presented for prediction acidity constant (pKa) of various benzoic acids and phenols with diverse chemical structures using a nonlinear quantitative structure-property relationship. A three-layered feed forward ANN with back-propagation of error was generated using six molecular descriptors appearing in the multi-parameter linear regression (MLR) model. The polarizability term (π I), most positive charge of acidic hydrogen atom (q+), molecular weight (MW), most negative charge of the acidic oxygen atom (q−), the hydrogen-bond accepting ability (ε B) and partial charge weighted topological electronic (PCWTE) descriptors are inputs and its output is pKa. It was found that properly selected and trained neural network with 205 compounds could fairly represent dependence of the acidity constant on molecular descriptors. For evaluation of the predictive power of the generated ANN, an optimized network was applied for prediction pKa values of 37 compounds in the prediction set, which were not used in the optimization procedure. Squared correlation coefficient (R2) and root mean square error (RMSE) of 0.9147 and 0.9388 for prediction set by the MLR model should be compared with the values of 0.9939 and 0.2575 by the ANN model. These improvements are due to the fact that acidity constant of benzoic acids and phenols in water shows nonlinear correlations with the molecular descriptors. Key Words : Quantitative structure-property relationship, Artificial neural networks, Acidity constant, Phenols, Benzoic acids Introduction

The macroscopic (bulk) activities/properties of chemical compounds clearly depend on their microscopic (structural) characteristics. Development of quantitative structure-property/activity relationships (QSPR/QSAR) on theoretical descriptors is a powerful tool not only for prediction of the chemical, physical and biological properties/activities of compounds, but also for deeper understanding of the detailed mechanisms of interactions in complex systems that predetermine these properties/activities. QSPR/QSAR models are essentially calibration models in which the independent variables are molecular descriptors that describe the structure of the molecules and the dependent variable is the property or activity of interest. Since these theoretical descriptors are determined solely from computational methods, a priori predictions of the properties/activities of compounds are possible, no laboratory measurements are needed thus saving time, space, materials, equipment and alleviating safety (toxicity) and disposal concerns. An enormous number of descriptors have been used by researchers to increase the ability to correlate biological, chemical and physical properties. To obtain a significant correlation, it is crucial that appropriate descriptors be employed. Various methods for constructing QSPR/QSAR models have been used including multi-parameter linear regression (MLR), principal component analysis (PCA) and partial least-squares regression (PLS). In some cases, it is more convenient that a linear relationship between property/ 1-10

11,12

13-16

activity and descriptors is considered. If there is not a welldefined linear relationship, the discussed method cannot give a perfect QSPR/QSAR model. Artificial neural networks (ANNs) are capable of recognizing highly nonlinear relationships. ANNs are biologically inspired computer programs designed to simulate the way in which the human brain processes information. ANNs gather their knowledge by detecting the patterns and relationships in data and learned (or trained) through experience, not from programming. There are many types of neural networks designed by now and new ones are invented every week. The behavior of a neural network is determined by transfer functions of its neurons, by learning rule, and by the architecture itself. An ANN is formed from artificial neuron or processing elements (PE), connected with coefficients (weights), which constitute the neural structure and are organized in layers. The first layer is termed the input layer, and the last layer is the output layer. The layers of neurons between the input and output layers are called hidden layers. Neural networks do not need on explicit formulation of the mathematical or physical relationships of the handled problem. These give ANNs an advantage over traditional fitting methods for some chemical application. For these reason in recent years, ANNs have been used to a wide variety of chemical problems such as simulation of mass spectra, ion interaction chromatography, aqueous solubility and partition coefficient, simulation of nuclear magnetic resonance spectra, prediction of bioconcentration factor, solvent effects on reaction rate and prediction of normalized polarity parameter 17-20

20

2008

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

Aziz Habibi-Yangjeh et al.

in mixed solvent systems. It has been shown that the acid-base properties affect the toxicity, chromatographic retention behavior and pharmaceutical properties of organic acids and bases. On the other hand, interpretation and prediction of pKa values for chemical compounds are of general importance and usefulness for chemists. Although in the last years several theoretical studies have been performed for correlation of pKa values with molecular parameters, but in these studies linear equations have been used. The main aim of present work is to develop a linear and nonlinear QSPR models based on molecular descriptors for prediction pKa values of various benzoic acids and phenols with diverse chemical structures (including 242 compounds). 21-36

37,38

39

38-46

Theory A detailed description of theory behind a neural network has been adequately described by different researchers. There are many types of neural network architectures, but the type that has been most useful for QSAR/QSPR studies is the multilayer feed - forward network with back-propagation (BP) learning rule. The number of neurons in the input and output layers are defined by system's properties. The number of neurons in the hidden layer could be considered as an adjustable parameter, which should be optimized. The input layer receives the experimental or theoretical information. The output layer produces the calculated values of dependent variable. The use of ANNs consists of two steps: “training” and “prediction”. In the training phase the optimum structure, weight coefficients and biases are searched for. These parameters are found from training and validation data sets. After the training phase, the trained network can be used to predict (or calculate) the outputs from a set of inputs. ANNs allow one to estimate relationships between input variables and one or several output dependent variables. The ANN reads the input and target values in the training data set and changes the values of the weighted links to reduce the difference between the calculated output and target values. The error between output and target values is minimized across many training cycles until network reaches specified level of accuracy. If a network is left to train for too long, however, it will overtrain and will lose the ability to generalize. 17-19

20

molecular electronic descriptors were calculated by Dragon package version 2.1. For this propose the output of the HyperChem software for each compound feed into the Dragon program and the descriptors were calculated. As a result, a total of 18 theoretical descriptors were calculated for each compound in the data sets (242 compounds). Linear correlations. Acidity constant of benzoic acids and phenols are literature values at 25 ºC. MLR model was developed for prediction of pKa values by molecular descriptors. The method of stepwise multi-parameter linear regression was used to select the most important descriptors and to calculate the coefficients relating the pKa to the descriptors. The MLR models were generated using spss/pc software package release 9.0. Neural network generation. The specification of a typical neural network model requires the choice of the type of inputs, the number of hidden layers, the number of neurons in each hidden layer and the connection structure between the inputs and the output layers. The number of input nodes in the ANNs was equal to the number of molecular descriptors in the MLR model. A three-layer network with a sigmoidal transfer function was designed. The initial weights were randomly selected between 0 and 1. Before training, the input and output values were normalized between 0.1 and 0.9. The optimization of the weights and biases was carried out according to the resilient backpropagation algorithm. The data set was randomly divided into three groups: a training set, a validation set and a prediction set consisting of 168, 37 and 37 molecules, respectively. The training and validation sets were used for the model generation and the prediction set was used for evaluation of the generated model, because a prediction set is a better estimator of the ANN generalization ability than a validation (monitoring) set. The performances of training, validation and prediction of ANNs are evaluated by the mean percentage deviation (MPD) and root-mean square error (RMSE), which are defined as follows: 48

49

50

51

N P exp – P cal ) i i MPD = ---1- ∑ (----------------------------exp N

RMSE =

22-36

Experimental Section

Pi

i=1

exp cal 2 N ( i – i ) --------------------------------

∑ i=1

P

N

P

(1) (2)

where Pi and Pical are experimental and calculated values of pKa with the models and N denote the number of data points. Individual percent deviation (IPD) is defined as follows: exp

Descriptor generation. The derivation of theoretical molecular descriptors proceeds from the chemical structure of the compounds. In order to calculate the theoretical descriptors, the z-matrices (molecular models) were constructed with the aid of HyperChem 7.0 and molecular structures were optimized using AM1 algorithm. In order to calculate some of theoretical descriptors, the molecular geometries of molecules were further optimized with the same algorithm in MOPAC program version 6.0. The other 47

IPD = 100 ×

calc P-------------------------– Pexp i -⎞ ⎟ exp ⎠ Pi

⎛ i ⎜ ⎝

(3)

The processing of the data was carried using Matlab 6.5. The neural networks were implemented using Neural Network Toolbox Ver. 4.0 for Matlab.

52

50

Prediction Acidity Constant Using QSPR Models

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

2009

tors. Acidity constant of the compounds decrease with increasing ε and PCWTE descriptors, because basicity of phenolic oxygen atom increases with increasing these descriptors. Effects of π , q and MW on pKa are higher than that of the other descriptors, because standardized coefficients of π , q and MW are higher than those of the other descriptors. The calculated values of pKa for the compounds in training, validation and prediction sets using the MLR model have been plotted versus the experimental values of it in Figure 1. The next step in this work was generation of the ANN model. There are no rigorous theoretical principles for choosing the proper network topology; so different structures were tested in order to obtain the optimal hidden neurons and training cycles. Before training the network, the number of nodes in the hidden layer was optimized. In order to optimize the number of nodes in the hidden layer, several training sessions were conducted with different numbers of hidden nodes. The root mean squared error of training (RMSET) and validation (RMSEV) sets were obtained at various iterations for different number of neu53

Results and Discussion

B

A major challenge in the development of MLR equations is connected with the possible multicollinearity of molecular descriptors. In order to decrease the redundancy existed in the descriptors data matrix, the correlation of descriptors with each other and with pKa of the compounds was examined and collinear descriptors were detected (r > 0.85). Among the collinear descriptors, one with the lowest correlation with the property was removed from the data matrix. Table 1 demonstrates that all of the descriptors are strongly orthogonal which reflects the statistical reliability of the model. Multi-parameter linear correlation of pKa values of 168 benzoic acids and phenols versus the molecular descriptors in the training set gives the results in Table 2. It can be seen from this table that six descriptors are appeared in the MLR model. These descriptors are: polarizability index (π ), most positive charge of acidic hydrogen atom (q ), molecular weight (MW), most negative charge of acidic oxygen atom (q−), the hydrogen-bond accepting ability (ε ) and partial charge weighted topological electronic (PCWTE) descriptors. The negative coefficient for π , q , q− and MW descriptors indicate that with increasing these descriptors, acidity constant (Ka) increases. With increasing q and q− of the compounds, interactions of water molecules with acidic hydrogen and oxygen of the compounds increase, then acidic hydrogen can be easily removed from the compounds. Polarizability and then the dipole-induced dipole interactions increase with increasing π and MW, as a result acidity of the compounds increases with increasing these descrip-

+

I

+

I

36

I

+

B

+

I

+

I

. Correlation coefficients between various theoretical descriptors that have been used in the multi-parameter linear regression (MLR) and artificial neural network (ANN) models Descriptor πI q+ q− εB MW PCWTE 1 0.530 0.042 0.150 0.329 0.285 πI q+ 0.530 1 0.642 0.546 0.368 0.070 q− 0.042 0.642 1 0.236 0.155 0.018 εB 0.150 0.546 0.236 1 0.248 0.038 MW 0.329 0.368 0.155 0.248 1 0.237 PCWTE 0.285 0.070 0.018 0.038 0.237 1 Table

1

. Plot of the calculated values of pKa from the MLR model versus the experimental values of it for training, validation and prediction sets. Figure 1

. Descriptors, symbols and results of the multi-parameter linear regression (MLR) modela No. Descriptor Symbol 1 polarizability term πI 2 most positive charge of acidic hydrogen atom q+ 3 molecular weight MW 4 most negative charge of the phenolic oxygen atom q_ 5 the hydrogen-bond accepting ability εB 6 partial charge weighted topological electronic PCWTE 7 constant

Table 2

Coefficient −8.3610 −110.4710 −0.0051 −26.3940 34.4450 0.0902 42.2780

β

0.080 0.521 0.074 0.321 0.080 0.101

a The β is standardized coefficient of descriptors. The polarizability term (π ) is obtained by dividing the polarizability volume by the molecular volume. The ε is equal 0.3–0.01(E –E ), in which E and E are referring to the LUMO energy for water and HOMO energy for the compound, respectively. I

B

lw

h

lw

h

2010

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

Aziz Habibi-Yangjeh et al.

iterations are stopped when overtraining begins. To control the overtraining of the network during the training procedure, the values of RMSET and RMSEV were calculated and recorded to monitor the extent of the learning in various iterations. Results obtained showed that after 77000 iterations the value of RMSEV started to increase very little and overfitting slightly began (Figure 3). The generated ANN was then trained using the training and validation sets for the optimization of the weights and biases. For the evaluation of the predictive power of the generated ANN, an optimized network was applied for prediction of the p a values of the compounds in the prediction set, which were not used in the modeling procedure (Table 3). The calculated values of p a for the compounds in training, validation and prediction sets using the ANN model have been plotted the experimental values of it in Figure 4. As expected, the calculated values of p a are in good agreement with those of the experimental values. The correlation equation for all of the calculated values of p a from the ANN model and the experimental values is as follows: p a(cal) = 0.99299 p a(exp) + 0.04454 (4) 2 (R = 0.9931; MPD = 4.5044; RMSE = 0.2648; F = 34295.94) Similarly, the correlation of p a (cal) p a (exp) values in the prediction set gives equation (5): p a(cal) = 1.01212p a(exp) – 0.08200 (5) (R2 = 0.9939; MPD = 5.0361; RMSE = 0.2575; F = 5718.11) Plot of IPD for p a values in the prediction set the experimental values of it has been illustrated in Figure 5. As can be seen, the model did not show proportional and systematic error, because the slope (a = 1.01212) and intercept (b = −0.08200) of the correlation equation are not significantly different from unity and zero, respectively and the propagation of errors in both sides of zero is random (Figure 5). Table 4 compares the results obtained using the MLR and ANN models. The squared correlation coefficient (R2) and RMSE of the models for total, training, validation and prediction sets show potential of the ANN model for prediction of p a values of various benzoic acids and phenols in water with one model. As a result, it was found that properly selected and trained neural network could fairly represent dependence of the acidity constant of benzoic acids and phenols in water on the molecular descriptors. Then the optimized neural network could simulate the complicated nonlinear relationship between p a values and the molecular descriptors. It can be seen from Table 4 that although the parameters appearing in the MLR model are used as inputs for the generated ANN, the statistics is shown a large improvement. These improvements are due to the fact that p a values of the compounds show nonlinear correlations with the molecular descriptors. K

K

versus

K

. Plot of RMSE for training and validation sets versus the number of nodes in hidden layer. Figure 2

rons at the hidden layer and the minimum value of RMSEV was recorded as the optimum value. Plot of RMSET and RMSEV the number of nodes in the hidden layer has been shown in Figure 2. It is clear that the twenty-four nodes in hidden layer is optimum value. The six descriptors appearing in the MLR model (including π I, q+, MW, q−, ε B, and PCWTE descriptors) were considered as inputs for developing the ANN. Then an ANN with architecture 6-24-1 was generated. It is note worthy that training of the network was stopped when the RMSEV started to increases . when overtraining begins. The overtraining causes the ANN to loose its prediction power.34,36 Therefore, during training of the networks, it is desirable that versus

i.e

K

K

K

K

K

K

versus

K

. Plot of RMSE for training and validation sets versus the number of iterations.

K

K

K

Figure 3

versus

K

Prediction Acidity Constant Using QSPR Models

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

2011

Experimental and calculated values of pKa for various benzoic acids and phenols in water at 25 ºC for training, validation and prediction sets by multi-parameter linear regression (MLR) and artificial neural network (ANN) models along with individual percent deviation (IPD)a No. Compound Exp. MLR IPDMLR ANN IPDANN Table 3.

Training set

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

2-acetylphenol 4-acetylphenol 2-allylphenol 4-bromophenol 2,6-di- -butyl-4-bromophenol 2,6-di- -butyl-4-methylphenol 2,6-di- -butyl-4-methoxyphenol 2- -butylphenol 3- -butylphenol 4- -butylphenol 1-chloro-2,6-dimethyl-4-hydroxybenzene 4-chloro-2-nitrophenol 2-chlorophenol 3-chlorophenol 4-chlorophenol -cresol 4-cyano-2,6-dimethylphenol 4-cyano-3,5-dimethylphenol 3-cyanophenol 3,5-dibromophenol 2,4-dichlorophenol 2,6-dichlorophenol 3,5-diethoxyphenol 3-(diethoxyphosphinyl)phenol 4-(diethoxyphosphinyl)phenol 3,4-dihydroxybenzaldehyde 1,2-dihydroxybenzene 1,4-dihydroxy-2,6-dinitrobenzene 1,3-dihydroxy-2-methylbenzene 1,2-dihydroxy-3-nitrobenzene 1,2-dihydroxy-4-nitrobenzene 3,5-diiodophenol 3,5-dimethoxyphenol 2,6-dimethyl-4-nitrophenol 3,5-dimethyl-4-nitrophenol 2,3-dimethylphenol 2,6-dimethylphenol 3,4-dimethylphenol 3,5-dimethylphenol 2,4-dinitrophenol 2,5-dinitrophenol 3,5-dinitrophenol 2-ethoxyphenol 3-ethoxyphenol 2-ethylphenol 2-fluorophenol 3-fluorophenol 4-fluorophenol 2'-hydroxyacetophenone 3'-hydroxyacetophenone 3-hydroxybenzaldehyde 4-hydroxybenzaldehyde tert tert

tert

tert tert tert

o

9.19 8.05 10.28 9.34 10.83 12.23 12.15 11.24 10.1 10.31 9.549 6.48 8.55 9.10 9.43 10.26 8.27 8.21 8.61 8.056 7.85 6.78 9.370 8.68 8.28 7.55 9.356 4.42 10.05 6.68 6.701 8.103 9.345 7.190 8.245 10.50 10.59 10.32 10.15 4.08 5.216 6.732 10.109 9.655 10.2 8.73 9.29 9.89 9.90 9.19 9.00 7.620

10.867 9.05 9.637 8.42 10.011 11.201 11.484 10.96 10.773 10.768 9.476 7.724 8.974 8.72 8.725 9.875 7.934 8.869 8.168 7.186 8.141 7.264 9.813 9.267 8.517 6.84 10.487 2.184 9.685 3.728 8.082 6.653 9.32 7.736 8.131 10.117 9.824 10.055 10.113 6.75 6.568 5.961 10.886 9.801 10.221 9.247 9.072 9.078 8.906 9.114 8.799 8.423

18.25 12.42 −6.25 −9.85 −7.56 −8.41 −5.48 −2.49 6.66 4.44 −0.76 19.20 4.96 −4.18 −7.48 −3.75 −4.06 8.03 −5.13 −10.80 3.71 7.14 4.73 6.76 2.86 −9.40 12.09 −50.59 −3.63 − 44.19 20.61 −17.89 −0.27 7.59 −1.38 −3.65 −7.23 −2.57 −0.36 65.44 25.92 −11.45 7.69 1.51 0.21 5.92 −2.35 −8.21 −10.04 −0.83 −2.23 10.54

9.272 8.794 9.972 9.126 10.975 11.983 11.936 10.752 10.504 10.788 9.944 6.475 8.117 8.971 9.03 10.174 7.872 8.033 8.2 8.024 7.96 6.827 9.529 8.628 8.276 7.623 9.456 4.425 9.603 6.68 6.824 8.126 9.497 7.439 8.299 10.246 10.264 10.298 10.068 4.081 5.222 6.658 10.117 9.617 10.28 9.112 9.16 9.992 9.232 9.46 9.25 7.96

0.89 9.24 −3.00 −2.29 1.34 −2.02 −1.76 −4.34 4.00 4.64 4.14 −0.08 −5.06 −1.42 −4.24 −0.84 −4.81 −2.16 −4.76 −0.40 1.40 0.69 1.70 −0.60 −0.05 0.97 1.07 0.11 −4.45 0.00 1.84 0.28 1.63 3.46 0.65 −2.42 −3.08 −0.21 −0.81 0.02 0.12 −1.10 0.08 −0.39 0.78 4.38 −1.40 1.03 −6.75 2.94 2.78 4.46

2012

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

Aziz Habibi-Yangjeh et al.

Table 3. Continued

No. 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107

Compound 2-hydroxybenzyl alcohol 3-hydroxybenzyl alcohol 1-hydroxy-2,4-dihydroxymethylbenzene 2-hydroxy-3-methoxybenzaldehyde (2-hydroxy-5-methylbenzene)-methanol 1-hydroxy-2-propylbenzene 4-hydroxy-α,α,α-trifluorotoluene 1-hydroxy-2,4,6-trihydroxymethylbenzene 4-indanol 4-iodophenol 2,6-di-iodo-4-nitrophenol 2-methoxyphenol 2-methoxy-4-(2-propenyl)phenol 6-methyl-2-butylphenol 2-methyl-4-tert-butylphenol 2,2'-methylenebis(4-chlorophenol) 2,2'-methylenebis(4,6-dichlorophenol) 4-methylsulfonyl-3,5-dimethylphenol 3-(s-methylthio)phenol 4-(s-methylthio)phenol 2-nitrohydroquinone 2-nitrophenol 4-nitrosophenol phenol 2-phenylphenol 5,6,7,8-tetrahydro-1-naphthol 5,6,7,8-tetrahydro-2-naphthol 2,4,6-tri-tert-butylphenol 2,4,5-trichlorophenol 3,4,5-trichlorophenol 3-trifluoromethylphenol 2,3,4-trimethylphenol 2,4,5-trimethylphenol 3,4,5-trimethylphenol 2,4,6-trimethylphenol 2,4,6-tripropylphenol 2-acetamidobenzoic acid 3-acetamidobenzoic acid 4-acetamidobenzoic acid 4-acetoxybenzoic acid 2-acetylbenzoic acid 3-acetylbenzoic acid 4-acetylbenzoic acid 3-amino-1-naphthoic acid anthracene-9-carboxylic acid 1,3-benzenedicarboxylic acid 1,4-benzenedicarboxylic acid 1,2,4,5-benzenetetracarboxylic acid 1,2,3-benzenetricarboxylic acid 1,2,4-benzenetricarboxylic acid 1,3,5-benzenetricarboxylic acid benzilic acid benzylamine-4-carboxylic acid 2-biphenylcarboxylic acid 2-bromobezoic acid

Exp. 9.92 9.83 9.79 7.912 10.15 10.50 8.675 9.56 10.32 9.200 3.32 9.99 10.0 11.72 10.59 7.6 5.6 8.13 9.53 9.53 7.63 7.222 6.48 9.99 9.55 10.28 10.48 12.19 7.37 7.839 8.950 10.59 10.57 10.25 10.88 11.47 3.63 4.07 4.28 4.38 4.13 3.83 3.70 2.61 3.65 3.62 3.54 1.92 2.88 2.52 2.12 3.09 3.59 3.46 2.85

MLR 8.355 9.663 8.488 7.76 8.525 10.439 8.982 8.357 10.033 7.939 4.131 10.489 10.686 10.478 10.963 8.15 6.129 8.435 8.572 8.717 8.45 8.396 7.768 9.578 8.847 9.883 10.088 11.724 7.345 7.275 9.098 10.322 10.325 10.358 10.002 10.741 4.692 4.289 4.035 3.351 4.025 3.79 3.654 3.817 2.991 3.374 3.266 2.563 3.582 2.79 2.971 3.872 4.358 3.194 3.896

IPDMLR −15.78 −1.70 −13.30 −1.92 −16.01 −0.58 3.54 −12.58 −2.78 −13.71 24.43 4.99 6.86 −10.60 3.52 7.24 9.45 3.75 −10.05 -8.53 10.75 16.26 19.88 −4.12 −7.36 −3.86 −3.74 −3.82 −0.34 −7.19 1.65 −2.53 −2.32 1.05 −8.07 −6.36 29.26 5.38 −5.72 −23.49 −2.54 −1.04 −1.24 46.25 −18.05 −6.80 −7.74 33.49 24.38 10.71 40.14 25.31 21.39 −7.69 36.70

ANN 10.146 9.866 9.615 7.974 9.876 10.586 8.559 9.623 10.289 9.088 3.304 9.427 10.093 11.065 11.007 7.566 5.637 8.094 9.636 9.743 7.633 7.166 6.693 10.346 9.367 10.473 10.639 12.342 7.396 7.771 9.242 10.648 10.684 10.498 10.58 11.165 3.641 4.212 3.87 3.979 4.023 3.597 3.87 2.834 3.562 3.571 3.975 2.002 2.771 2.375 2.34 3.27 4.14 3.665 2.918

IPDANN 2.28 0.37 −1.79 0.78 −2.70 0.82 −1.34 0.66 −0.30 −1.22 −0.48 −5.64 0.93 −5.59 3.94 −0.45 0.66 −0.44 1.11 2.24 0.04 −0.78 3.29 3.56 −1.92 1.88 1.52 1.25 0.35 −0.87 3.26 0.55 1.08 2.42 −2.76 −2.66 0.30 3.49 −9.58 −9.16 −2.59 −6.08 4.59 8.58 −2.41 −1.35 12.29 4.27 −3.78 −5.75 10.38 5.83 15.32 5.92 2.39

Prediction Acidity Constant Using QSPR Models

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

2013

Table 3. Continued

No. 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162

Compound 3-bromobezoic acid 3-tert-butylbenzoic acid 4-tert-butylbenzoic acid 2-chlorobenzoic acid 3-chlorobenzoic acid 2-chloro-4-nitrobenzoic acid 2-chloro-5-nitrobenzoic acid 2-chloro-6-nitrobenzoic acid 2-cyanobenzoic acid 3,5-diaminobenzoic acid 3,6-dichlorophthalic acid 2,4-dihydroxybenzoic acid 2,5-dihydroxybenzoic acid 3,5-dihydroxybenzoic acid 2,6-dimethoxybenzoic acid 2,3-dimethylbenzoic acid 2,4-dimethylbenzoic acid 2,5-dimethylbenzoic acid 3,5-dimethylbenzoic acid 2,3-dimethylnaphthalene-1-carboxylic acid 2,3-dinitrobenzoic acid 2,4-dinitrobenzoic acid 2,5-dinitrobenzoic acid 3,5-dinitrobenzoic acid 2-ethylbenzoic acid 4-ethylbenzoic acid 2-fluorobenzoic acid 3-fluorobenzoic acid 3-hydroxybenzoic acid 4-hydroxybenzoic acid 2-hydroxy-5-bromobenzoic acid 2-hydroxy-5-chlorobenzoic acid 4-hydroxy-3-methoxybenzoic acid 2-hydroxy-5-methylbenzoic acid 2-hydroxy-6-methylbenzoic acid 2-hydroxy-3-nitrobenzoic acid 2-hydroxy-5-nitrobenzoic acid 2-hydroxy-6-nitrobenzoic acid 4-iodobenzoic acid mesitylenic acid 2-methoxybenzoic acid 3-methoxybenzoic acid 3-methylbenzoic acid 4-methylbenzoic acid 2-methyl-3,5-dinitrobenzoic acid 2-methyl-1-naphthoic acid 3-methylsulfonylbenzoic acid 4-methylsulfonylbenzoic acid 1-naphthalenecarboxylic acid 2-naphthalenecarboxylic acid 4-nitrobenzene-1,2-dicarboxylic acid 2-nitrobenzoic acid 3-nitrobenzoic acid 4-nitrobenzoic acid o-phthalic acid

Exp. 3.810 4.199 4.389 2.877 3.83 1.96 2.17 1.342 3.14 5.30 1.46 3.29 2.97 4.04 3.44 3.771 4.217 3.990 4.302 3.33 1.85 1.43 1.62 2.85 3.79 4.35 3.27 3.865 4.076 4.582 2.61 2.63 4.355 4.08 3.32 1.87 2.12 2.24 4.00 4.32 4.09 4.08 4.269 4.362 2.97 3.11 3.52 3.64 3.695 4.161 2.11 2.18 3.46 4.441 2.950

MLR 3.065 5.079 4.985 3.287 3.319 1.619 1.654 2.153 2.809 4.731 3.287 3.956 3.802 3.911 5.312 4.55 4.499 4.522 4.467 4.439 2.331 0.688 1.12 1.127 4.575 4.451 3.629 3.586 3.993 3.912 3.096 3.342 3.803 4.275 5.268 2.427 2.435 2.774 2.59 4.467 4.146 4.006 4.282 4.214 1.552 3.509 3.113 3.062 3.68 3.214 1.933 2.444 2.38 2.295 3.361

IPDMLR −19.55 20.96 13.58 14.25 −13.34 −17.40 −23.78 60.43 −10.54 −10.74 125.14 20.24 28.01 −3.19 54.42 20.66 6.69 13.33 3.84 33.30 26.00 −51.89 −30.86 −60.46 20.71 2.32 10.98 −7.22 −2.04 −14.62 18.62 27.07 −12.68 4.78 58.67 29.79 14.86 23.84 −35.25 3.40 1.37 −1.81 0.30 −3.39 −47.74 12.83 −11.56 −15.88 −0.41 −22.76 −8.39 12.11 −31.21 −48.32 13.93

ANN 3.915 4.418 4.104 3.086 3.698 1.868 1.929 1.417 2.962 5.225 1.271 3.639 3.408 3.805 3.447 3.668 4.019 3.799 4.189 3.532 2.258 1.587 1.905 2.681 3.757 4.2 3.67 3.584 4.496 4.627 2.722 3.394 4.022 3.365 3.231 1.93 2.059 2.723 4.162 4.189 3.963 4.305 4.303 4.541 2.982 3.135 3.581 3.323 3.938 3.693 1.917 2.429 3.196 3.286 2.555

IPDANN 2.76 5.22 −6.49 7.26 −3.45 −4.69 −11.11 5.59 −5.67 −1.42 −12.95 10.61 14.75 −5.82 0.20 −2.73 −4.70 −4.79 −2.63 6.07 22.05 10.98 17.59 −5.93 −0.87 −3.45 12.23 −7.27 10.30 0.98 4.29 29.05 −7.65 −17.52 −2.68 3.21 −2.88 21.56 4.05 −3.03 −3.11 5.51 0.80 4.10 0.40 0.80 1.73 −8.71 6.58 −11.25 −9.15 11.42 −7.63 −26.01 −13.39

2014

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

Aziz Habibi-Yangjeh et al.

Table 3. Continued

No. 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213

Compound 3-sulfamylbenzoic acid 4-sulfamylbenzoic acid 2,3,5,6-tetramethylbenzoic acid 2,4,6-tribromobenzoic acid 3,4,5-trihydroxybenzoic acid 2,4,6-trimethylbenzoic acid Validation set

2-bromophenol 2,4-di-tert-butylphenol 4-chloro-2,6-dinitrophenol m-cresol 1,3-dichloro-2,5-dihydroxybenzene 3,4-dichlorophenol 1,3-dihydroxybenzene 2,4-dimethylphenol 2,6-dinitrophenol 3-ethylphenol 4'-hydroxyacetophenone 4-hydroxybenzyl alcohol 3-hydroxy-4-methoxybenzaldehyde 2-iodophenol 3-methoxyphenol 3-methylsulfonylphenol 3-nitrophenol 4-phenylphenol 1,2,3-trihydroxybenzene 2-acetoxybenzoic acid 4-amino-2-naphthoic acid 1,2,3,4-benzenetetracarboxylic acid benzoic acid 4-bromobezoic acid 4-chlorobenzoic acid 3-cyanobenzoic acid 3,4-dihydroxybenzoic acid 2,6-dimethylbenzoic acid 2,6-dinitrobenzoic acid 4-fluorobenzoic acid 2-hydroxy-3-methylbenzoic acid 2-iodobenzoic acid 4-methoxybenzoic acid 2-methyl-4-nitrobenzoic acid 2-nitrobenzene-1,4-dicarboxylic acid 2-phenoxybenzoic acid 2,4,6-trihydroxybenzoic acid Prediction set

3-bromophenol 2,6-di-tert-butylphenol 4-chloro-3-methylphenol p-cresol 2,3-dichlorophenol 3,5-dichlorophenol 1,4-dihydroxybenzene 1,4-dihydroxy-2,3,5,6-tetramethylbenzene

Exp. 3.54 3.47 3.415 1.41 4.19 3.448

MLR 3.209 3.159 5.746 2.063 3.528 5.211

IPDMLR −9.35 −8.96 68.26 46.31 −15.80 51.13

ANN 3.73 3.324 3.401 1.408 3.824 3.641

IPDANN 5.37 −4.21 −0.41 −0.14 −8.74 5.60

8.452 11.64 2.97 10.00 7.30 8.630 9.44 10.58 3.713 10.07 8.05 9.82 8.889 8.464 9.652 9.33 8.360 9.55 9.03 3.48 2.89 2.05 4.204 3.99 3.986 3.60 4.48 3.362 1.14 4.14 2.99 2.86 4.49 1.86 1.73 3.53 1.68

8.691 11.741 1.531 9.835 7.295 8.014 9.497 10.069 2.105 10.127 8.846 9.53 6.87 8.042 9.475 8.579 7.627 8.554 8.412 3.934 3.547 3.004 4.015 2.879 3.234 2.725 3.758 4.656 2.051 3.573 4.336 3.547 3.91 2.746 1.981 3.36 1.793

2.83 0.87 −48.45 −1.65 −0.07 −7.14 0.60 −4.83 −43.31 0.57 9.89 −2.95 −22.71 −4.99 −1.83 −8.05 −8.77 −10.43 −6.84 13.05 22.73 46.54 −4.50 −27.84 −18.87 −24.31 −16.12 38.49 79.91 −13.70 45.02 24.02 −12.92 47.63 14.51 −4.82 6.73

8.673 11.887 2.959 10.017 7.066 8.343 9.415 10.367 3.718 10.217 8.155 9.829 8.744 8.505 9.698 9.219 8.185 9.647 8.938 3.743 3.028 2.064 4.297 4.057 3.88 3.381 4.153 3.486 1.103 3.854 3.376 2.922 4.456 2.968 1.953 3.758 1.793

2.61 2.12 −0.37 0.17 −3.21 −3.33 −0.26 −2.01 0.13 1.46 1.30 0.09 −1.63 0.48 0.48 −1.19 −2.09 1.02 −1.02 7.56 4.78 0.68 2.21 1.68 −2.66 −6.08 −7.30 3.69 −3.25 −6.91 12.91 2.17 −0.76 59.57 12.89 6.46 6.73

9.031 11.7 9.549 10.26 7.44 8.179 9.91 11.25

8.378 11.053 9.111 9.774 8.196 7.873 9.613 10.3

−7.23 −5.53 −4.59 −4.74

9.17 11.895 9.593 10.216 7.827 8.198 10.153 10.723

1.54 1.67 0.46 −0.43 5.20 0.23 2.45 −4.68

10.16

−3.74 −3.00 −8.44

Prediction Acidity Constant Using QSPR Models Table 3.

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

2015

Continued

No. 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242

Compound Exp. MLR IPDMLR ANN IPDANN 2,5-dimethylphenol 10.22 10.115 −1.03 10.31 0.88 3,4-dinitrophenol 5.424 7.121 31.29 5.319 −1.94 4-ethylphenol 10.0 10.064 0.64 10.293 2.93 2-hydroxybenzaldehyde 8.34 9.833 17.90 8.155 −2.22 4-hydroxybenzonitrile 7.95 7.911 −0.49 8.166 2.72 4-hydroxy-3-methoxybenzaldehyde 7.396 7.974 7.82 7.896 6.76 3-iodophenol 8.879 8.099 −8.78 8.921 0.47 4-methoxyphenol 10.20 9.587 −6.01 10.282 0.80 4-methylsulfonylphenol 7.83 7.936 1.35 7.647 −2.34 4-nitrophenol 7.150 7.232 1.15 7.219 0.97 3-phenylphenol 9.63 8.485 −11.89 9.671 0.43 1,3,5-trihydroxybenzene 8.45 8.929 5.67 8.107 −4.06 3-acetoxybenzoic acid 4.00 3.47 −13.25 3.822 −4.45 anthracene-2-carboxylic acid 4.18 2.148 −48.61 4.186 0.14 1,2,3,5-benzenetetracarboxylic acid 2.38 1.625 −31.72 2.379 −0.04 2-benzoylbenzoic acid 3.54 3.223 −8.95 3.185 −10.03 2-bromo-6-nitrobenzoic acid 1.37 2.004 46.28 0.957 −30.15 2-chloro-3-nitrobenzoic acid 2.02 2.266 12.18 2.536 25.54 4-cyanobenzoic acid 3.55 2.619 −26.23 3.873 9.10 2,6-dihydroxybenzoic acid 1.30 2.864 120.31 1.084 −16.62 3,4-dimethylbenzoic acid 4.41 4.471 1.38 4.255 −3.51 3,4-dinitrobenzoic acid 2.82 2.251 −20.18 2.738 −2.91 2-hydroxybenzoic acid 2.98 4.091 37.28 3.313 11.17 2-hydroxy-4-methylbenzoic acid 3.17 4.308 35.90 3.128 −1.32 3-iodobenzoic acid 3.86 2.771 −28.21 3.529 −8.58 2-methylbenzoic acid 3.90 4.357 11.72 3.749 −3.87 2-methyl-6-nitrobenzoic acid 1.87 4.44 137.43 1.939 3.69 3-nitrobenzene-1,2-dicarboxylic acid 1.88 2.334 24.15 1.872 −0.43 4-phenoxybenzoic acid 4.52 3.194 −29.34 3.993 −11.66 a Exp. refers to the experimental values of pKa, MLR and ANN refer to multi-parameter linear regression and artificial neural network calculated values of pKa, respectively.

. Plot of the calculated values of pKa from the ANN model versus the experimental values of it for training, validation and prediction sets. Figure 4

. Plot of the residual for calculated values of pKa from the ANN model versus the experimental values of it for prediction set. Figure 5

2016

Bull. Korean Chem. Soc. 2005, Vol. 26, No. 12

Aziz Habibi-Yangjeh et al.

. Comparsion of statistical parameters obtained by the MLR and ANN models for correlation acidity constant of phenols and benzoic acids with the molecular descriptorsa Table 4

Model MLR ANN

R2tot 0.9266 0.9931

R2train 0.9268 0.9926

R2valid 0.9400 0.9943

R2pred 0.9147 0.9939

RMSEtot 0.8610 0.2648

RMSEtrain 0.8553 0.2700

RMSEvalid 0.8034 0.2479

RMSEpred 0.9388 0.2575

a

Subscript train is referring to the training set, valid is referring to the validation set and pred is referring to the prediction set, tot is refering to the total

data set and R is the correlation coefficient.

Conclusions A linear and non-linear QSPR models have been developed for prediction of acidity constant (pKa) for various benzoic acids and phenols in water. Comparison of the values of RMSE for training, validation and prediction sets (and other statistical parameters in Table 4) for the MLR and ANN models show superiority of the nonlinear model over the regression model. Root-mean square error of 0.9388 for the prediction set by the MLR model should be compared with the value of 0.25751 for the ANN model. Since the improvement of the results obtained using nonlinear model (ANN) is considerable, it can be concluded that the nonlinear characteristics of the molecular descriptors on the pKa values of the compounds in water is serious.

Acknowledgements. The Authors wish to acknowledge

the vice-presidency of research, university of Mohaghegh Ardebili, for financial support of this work.

References

2000

2002

1997

1999

1999

1999

1999

2000

2004

2002

2003

2003

2004

2004

1. Katritzky, A. R.; Karelson, M.; Lobanov, V. S. Pure Appl. Chem. , 69, 245. 2. Balaban, A. T. J. Chem. Inf. Comut. Sci. , 37, 645. 3. Benfenati, E.; Gini, G. Toxicology , 119, 213. 4. Cronce, D. T.; Famini, G. R.; Soto, J. A. D.; Wilson, L. Y. J. Chem. Soc., Perkin Trans. 2 , 1293. 5. Engberts, J. B. F. N.; Famini, G. R.; Perjessy, A.; Wilson, L. Y. J. Phys. Org. Chem. , 11, 261. 6. Hiob, R.; Karelson, M. J. Chem. Inf. Comut. Sci. , 40, 1062. 7. Habibi-Yangjeh, A. Indian J. Chem. , 42B, 1478. 8. Habibi-Yangjeh, A. Indian J. Chem. 43B, 1504. 9. Nikolic, S.; Milicevic, A.; Trinajstic, N.; Juric, A. Molecules , 9, 1208. 10. Devillers, J. SAR and QSAR Environ. Res. , 15, 501. 11. Karelson, M.; Lobanov, V. S. Chem. Rev. , 96, 1027. 12. Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; Wiley-VCH: Weinheim, Germany, 2000. 13. Kramer, R. Chemometric Techniques for Quantitative Analysis; Marcel Dekker: New York, 1998. 14. Wold, S.; Sjöström, M. Chemom. Intell. Lab. Syst. , 44, 3. 15. Barros, A. S.; Rutledge, D. N. Chemomet. Intell. Lab. Syst. , 40, 65. 16. Garkani-Nejad, Z.; Karlovits, M.; Demuth, W.; Stimpfl, T.; Vycudilik, W.; Jalali-Heravi, M.; Varmuza, K. J. Chromatogr. A , 1028, 287. 17. Patterson, D. W. Artificial Neural Networks: Theory and Applications; Simon and Schuster: New York, 1996; Part III, Ch. 6. 18. Bose, N. K.; Liang, P. Neural Network Fundamentals; McGrawHill: New York, 1996. 1997

1997

1997

1998

1998

2000

2003

2004,

2004

2004

1996

1998

1998

2004

19. Zupan, J.; Gasteiger, J. Neural Networks in Chemistry and Drug Design; Wiley-VCH: Weinhein, 1999. 20. Agatonovic-Kustrin, S.; Beresford, R. J. Pharm. Biomed. Anal. , 22, 717. 21. Fatemi, M. H. J. Chromatogr. A , 955, 273. 22. Xing, W. L.; He, X. W. Anal. Chim. Acta , 349, 283. 23. Bunz, A. P.; Braun, B.; Janowsky, R. Fluid Phase Equilib. , 158, 367. 24. Homer, J.; Generalis, S. C.; Robson, J. H. Phys. Chem. Chem. Phys. , 1, 4075. 25. Goll, E. S.; Jurs, P. C. J. Chem. Inf. Comp. Sci. , 39, 974. 26. Vendrame, R.; Braga, R. S.; Takahata, Y.; Galvao, D. S. J. Chem. Inf. Comput. Sci. , 39, 1094. 27. Gaspelin, M.; Tusar, L.; Smid-Korbar, J.; Zupan, J.; Kristl, J. Int. J. Pharm. , 196, 37. 28. Gini, G.; Cracium, M. V.; Konig, C.; Benfenati, E. J. Chem. Inf. Comput. Sci. , 44, 1897. 29. Urata, S.; Takada, A.; Uchimaru, T.; Chandra, A. K.; Sekiya, A. J. Fluorine Chem. , 116, 163. 30. Koziol, J. Internet Electron J. Mol. Des. , 2, 315. , 43, 1077. 31. Wegner, J. K.; Zell, A. J. Chem. Inf. Comput. Sci. 32. Valkova, I.; Vracko, M.; Basak, S. C. Anal. Chim. Acta , 509, 179. 33. Sebastiao, R. C. O.; Braga, J. P.; Yoshida, M. I. Thermochimica Acta , 412, 107. 34. Jalali-Heravi, M.; Masoum, S.; Shahbazikhah, P. J. Magn. Reson. , 171, 176. 35. Habibi-Yangjeh, A.; Nooshyar, M. Bull. Korean Chem. Soc. , 26, 139. 36. Habibi-Yangjeh, A.; Nooshyar, M. Physics and Chemistry of Liquids , 43, 239. 37. Selassie, C. D.; DeSoyza, T. V.; Rosario, M.; Gao, H.; Hansch, C. Chemico-Biological Interaction , 113, 175. 38. Zhao, Y.-H.; Yuan, L.-H.; Wang, L.-S. Bull. Environ. Contam. Toxicol. , 57, 242. 39. Hemmateenejad, B.; Sharghi, H.; Akhond, M.; Shamsipur, M. J. Solution Chem. , 32, 215. 40. Gruber, C.; Buss, V. Chemosphere , 19, 1595. 41. Citra, M. J. Chemosphere , 38, 191. 42. Schuurmann, G. Quant. Struct. Act. Relat. , 15, 121. 43. Gross, K. C.; Seybold, P. G. Int. J. Quant. Chem. , 85, 569. 44. Liptak, M. D.; Gross, K. C.; Seybold, P. G.; Feldgus, S.; Shields, G. C. J. Am. Chem. Soc. , 124, 6421. 45. Hanai, T.; Koizumi, K.; Kinoshita, T. J. Liq. Chromatogr. Relat. , 23, 363. Technol. 46. Ma, Y.; Gross, K. C.; Hollingsworth, C. A.; Seybold, P. G.; Murray, J. S. J. Mol. Model , 10, 235. 47. HyperChem, Release 7.0 for Windows, Molecular Modeling System; Hypercube Inc.: 2002. 48. Todeschini, R.; Consonni, V.; Pavan, M. Dragon Software, Version 2.1; 2002. 49. Dean, J. A. Lange’s Handbook of Chemistry, 15th Ed.; McGrawHill, Inc.: 1999. 50. Demuth, H.; Beale, M. Neural Network Toolbox; Mathworks: Natick, MA, 2000. 51. Despagne, F.; Massart, D. L. Analyst , 123, 157R. 52. Matlab 6.5; Mathworks: 1984-2002. 53. Famini, G. R.; Wilson, L. Y. J. Phys. Org. Chem. , 12, 645. 2004

2005

2005

1998

1996

2003

1989

1999

1996

2001

2002

2000

2004

1998

1999

View more...

Comments

Copyright © 2017 DOCIT Inc.