|
Medicinal Chemistry Applet
Quantitative Structure-Activity Relationships (QSAR) |
||||||||||||||||||||||||||||||||||
|
Introduction In the early 1960s, Corwin Hansch extened the concept of linear-free energy relationships (LFER) to describe the effectiveness of a biologically-active molecule. This represented an effort to quantitatively relate the structure of a compound to its activity, and the resulting equations were aptly named quantitative structure-activity relationships (QSAR). Today, these equations are also called quantitative structure-property relationships (QSPR). The most common properties that are correlated to biological activity are electronics and lipophilicity. The parameters used to measure electronics and lipophilicity are σ, Hammett values, and π, a lipophilicity constant developed specifically for QSAR by Hansch, respectively. Many other parameters have been investigated in QSAR equations, but none have found the wide acceptance of σ and π. With these two parameters, a typical QSAR equation takes the form of Equation 1. Note that the π term is present as both π and π2 since lipophilicity tends to follow a parabolic relationship relative to activity. Log 1/C is a term representing the concentration of a drug needed to achieve a desired level of effect. k, k', ρ, and k" are all regression coefficients.
The electronic parameter, σ, was developed by Hammett in his pioneering work on ionization constants of substituted benzoic acids (Scheme 1). Hammett related the electronic effect of a substituent to the difference between log Ka of the substituted and unsubstituted benzoic acids (Equations 2 and 3). By definition, the σ value of hydrogen is 0. Since the logarithm of an equilibrium constant is proportional to the free energy change of the reaction (ΔG), σ values measure the free energy change caused by a particular substituent (Equation 4).
Like Hammett values, Hansch's lipophilicity parameter, π, is based on how a substituent affects the position of an equilibrium . Hansch values specifically address the effect of a substituent on the partitioning of a molecule between two solvents, typically water and 1-octanol (Scheme 2). Octanol and water have been found to closely model the membrane-aqueous interface in biological systems. The partitioning of the molecule between the two phases can be measured as an equilibrium constant, P (Equation 5). The difference between the substituted and unsubstituted log P values gives the π value for that particular substituent (Equation 6). By definition, the π value for hydrogen is 0. As was demonstrated for Hammett values, π values measure the free energy change caused by a particular substituent.
For σ and π values, measures of free energy changes, to relate to biological activity, biological activity must be somehow quantified in a form that is also related to free energy changes. Biological activity is often reported as ED50 (dose/concentration required to achieve 50% of maximal response), IC50 (concentration required to achieve 50% inhibition), or LD50 (dose/concentration resulting in death of 50% of a population). Through simple receptor theory, the values ED50, IC50, and LD50 can be shown to be equal the dissociation equilibrium constant of the drug and its receptor, KD (Equation 7). The logarithm of KD, or equivalently a value such as IC50, is proportional to the free energy change of drug binding (Equation 8). Values such as log IC50 are therefore useful in Hansch equations (Equation 1) to try to relate to free energy parameters such as σ and π.
Using Hansch equations like Equation 1, a formula may be developed to relate biological activity to molecular substitution. Equations of this type are valuable since they allow prediction of activity without actually requiring synthesis and testing of a compound. Molecules that are predicted to have low activity may be avoided, and research can focus on compounds with a high probability of showing good activity. In practice, a number of compounds must first be prepared. This series of compounds should have different substituents with a range of properties (electron-donating/withdrawing and lipo/hydrophilic) and measurable biological activity. Linear regression analysis may then be used to determine a best-fit line for the data. The regressions, normally performed by either a spreadsheet like Excel or specialized QSAR software, attempt to fit the selected parameters to the experimental biological activity. The output of the regression will be as coefficients for the parameters and an r-value for the line. If the parameters are of similar magnitude (true for σ and π), the coefficients give an idea of the relative importance of each parameter. A larger coefficient normally indicates a greater impact for the variable and its related molecular property. The r-value relates the goodness of the fit of the line. A poor fit indicates that additional parameters describing other properties may be needed to more precisely predict the biological activity. While the r-value will always fall in the range of -1 to +1. The sign of the r-value will match the sign of the slope of the line. R-values closer to +1 or -1 indicate a better fit than values closer to 0. The r2-value indicates the fraction of data variation that is accommodated by the included parameters. Similarly, r2*100 gives the percentage of data variation accommodated by the parameters. Note that while r-values may be positive or negative, r2 will always have a positive value. Generating useful Hansch equations can be very challenging, and even a good Hansch equation will not give perfect predictions of activity. For this reason, new methods have somewhat replaced the traditional Hansch analysis. In the late 1980s and early 1990s, combinatorial chemistry emerged and diminished the importance of QSAR. Since large libraries of compounds bearing varying substituents could be easily prepared, being able to predict activity was no longer necessary. Simply make all the compounds one can imagine and test them in high-throughput screens. Since the middle 1990s, a technique called Comparative Molecular Field Analysis (CoMFA) has emerged. This method uses highly complicated statistical analysis with large numbers of variables to correlate spacial molecular properties to activity. In practice, more variables than compounds may be studied, and the possibility of false correlations must be realized. Techniques like CoMFA have become available as the cost of computer processor power has continued to decline. |
||||||||||||||||||||||||||||||||||
|
Applet This applet accepts x,y-coordinate data, performs a linear regression on the points, and displays the best-fit slope and y-intercept with the correlation coefficient (as r2). The best fit line is also plotted. Only one independent variable is possible in this exercise. In terms of QSAR, this limits the study to one parameter, such as only σ or only π. Furthermore, this exercise accommodates only up to ten coordinate pairs of data. The script reads in data from the table on the right until the end of the form, blank field, or a non-numberic input is encountered. If data points have been entered after a blank field or non-numeric input, they will be ignored. The final line will be printed with the format below. log 1/C = (slope)*sigma/pi + (y-intercept) r^2 = value |
||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
|
Problem information The mutagenicity of certain aniline mustards (1) has been correlated to both the lipophilicity and electronic effects of substituent groups in the 4-position. Table 1 shows the σ and π values for seven substituents with the experimental log 1/(B + 100) value, a measure of mutagenicity.
|
||||||||||||||||||||||||||||||||||
|
Problems
|
||||||||||||||||||||||||||||||||||
|
References
|