Say we have a set of data, , shown at the left. If we have reason to believe that there exists a linear relationship between the variables x and y, we can plot the data and draw a "best-fit" straight line through the data. Of course, this relationship is governed by the familiar equation . We can then find the slope, m, and y-intercept, b, for the data, which are shown in the figure below.
Enter your data as we did in columns B and C. The reason for this is strictly cosmetic as you will soon see.
Linear regression equations.
If we expect a set of data to have a linear correlation, it is not necessary for us to plot the data in order to determine the constants m (slope) and b (y-intercept) of the equation . Instead, we can apply a statistical treatment known as linear regression to the data and determine these constants.
Given a set of data
with n data points, the slope, y-intercept and correlation coefficient, r,
can be determined using the following:
(Note that the limits of the summation, which are i to n, and the summation indices on x and y have been omitted.)
Implicitly applying regression to the sample data.
It may appear that the above equations are quite complicated, however upon inspection, we see that their components are nothing more than simple algebraic manipulations of the raw data. We can expand our spread sheet to include these components.
Linear regression with built-in functions.
It is plain to see that the slope and y-intercept values that were calculated using linear regression techniques are identical to the values of the more familiar trendline from the graph in the first section; namely m = 0.5842 and b = 1.6842. In addition, Excel can be used to display the R-squared value. Again, R2 = r2. From the graph, we see that R2 = 0.9488. From our linear regression analysis, we find that r = 0.9741, therefore r2 = 0.9488, which is agrees with the graph.
You should now see that the Excel graphing routine uses linear regression to calculate the slope, y-intercept and correlation coefficient.
Excel has three built-in functions that allow for a third method for determining the slope, y-intercept, correlation coefficient, and R-squared values of a set of data. The functions are SLOPE(), INTERCEPT(), CORREL() and RSQ(), and are also covered in the statistics section of this tutorial.
The syntax for each are as follows:
Here is how we would analyze our data using these built-in Excel functions. Again, the equations for each calculation are highlighted in yellow.
So, to reiterate, we can determine the slope, y-intercept and correlation coefficient of any set of data using three Excel methods:
Copyright © 2000, Clemson University. All Rights Reserved.