|
Medicinal Chemistry Applet
Quantitative Structure-Activity Relationships (QSAR) |
|||||||||||||||||||||||||||||||||||||
|
Introduction While QSAR analysis as described by Hansch can be used to analyze the effect of just one independent variable, typically multiple independent variables are investigated simultaneously. The so-termed "Hansch equation" (eq. 1) demonstrates this point as it invokes three independent variables. In this equation, the three variable, σ, π, and π2, related to electronic effects and lipophilicity. k, k', ρ, and k" are all regression coefficients. Whether involving just one or multiple independent variables, the theory behind QSAR analysis is the same (see QSAR applet). log (1/C) = -kπ + k'π2 + ρσ + k" (1) Linear regressions on simple x,y-data (one independent variable) are trivial and may be readily solved algebraically. Solving arrays of variables is traditionally performed through matrices and linear algebra. At a minimum, the number of data points (observations) must be one larger than the number of independent variables. The extra data point is required to accommodate the added k" term. Multiple linear regressions are often described with Equation 2. The x-variables contribute to a greater or lesser degree to the y-value. The degree of contribution is measured through the coefficients on the x-variables (a1 through an). In QSAR uses, y is equivalent to log (1/C) and the x-variables are all parameters. y = a0 + a1x1 + a2x2 + ... + anxn (2) Determining the values of a-coefficients in Equation 2 requires assembling matrices that are filled with values of the x-variables and corresponding y-values. The ultimate equation required is shown in Equation 3. X is a matrix that contains the x-variable data with an extra column for the a0 term. Xt is the transpose matrix of X. Y is a one-column matrix containing all the y-values. β is a one-column matrix containing the values of a0 through an. β = (XtX)-1XtY (3) Matrix operations do not directly afford the best fit line for the data. Once the coefficients have been determined, these coefficients are used to determine calculated y-values. Plotting the calculated y-values against experimental y-values gives a scatter plot. These data points then can be fit to a line. A perfect fit would reveal data with a r2 of 1.0. The r2 value is a crude measure for the goodness of the fit, and it roughly gives the fraction of the variance that can be approximated with the included X-variables. For example, if an line has a r2 of 0.75, then the included x-variables account for 75% of the variance in the y-data. |
|||||||||||||||||||||||||||||||||||||
|
Applet This applet accepts x,y-coordinate data with the possibility of up to four independent variables (x0 through x4) and up to ten data points. The regression is performed through matrix operations with the x-variable coefficents placed in the table to the right of the graph. A best-fit line is then determined with its r2-value also placed in the table. The calculated/theoretical points are then plotted with the best-fit line. |
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Problem information The antibacterial activity of a series of compounds (1) against Staphylococcus aureus has been reported by Hansch. The values for parameters π, π2, and σ are given for six compounds in the series. Also included are the log A values (log A is simply a measure of activity).
|
|||||||||||||||||||||||||||||||||||||
|
Problems
|
|||||||||||||||||||||||||||||||||||||
|
References
|