What is linear regression?
Linear regression is a plot of data that graphically depicts the linear relationship between an independent variable and a dependent variable. It is typically used to visually show the strength of the relationship and the spread of the results—all for the purpose of explaining the behavior of the dependent variable.
Suppose we wanted to test the strength of the relationship between the amount of ice cream consumed and obesity. We would take the independent variable, amount of ice cream, and relate it to the dependent variable, obesity, to see if there is a relationship. Because a regression is a graphical representation of this relationship, the lower the variability in the data, the stronger the relationship and the tighter the fit to the regression line.
Important points to remember
- Linear regression models the relationship between one or more dependent and independent variables.
- A regression analysis can be performed when the variables are independent, there is no heteroscedasticity, and the error terms of the variables are uncorrelated.
- Modeling linear regression in Excel is easier with the Data Analysis ToolPak.
There are a few key assumptions about your data set that must be true in order to perform regression analysis:
If these three things sound complicated, they are. However, if any of these considerations are incorrect, the estimate is biased. Essentially, you would be distorting the relationship you are measuring.
Generating a regression in Excel
The first step to performing a regression analysis in Excel is to verify that the free Excel plugin Data Analysis ToolPak is installed. This plugin makes it very easy to calculate a range of statistics. This is not required to draw a linear regression line, but makes it easier to create statistical tables. To check if it is installed, select “Data” from the toolbar. If “Data Analysis” is an option, the feature is installed and ready to use. If it is not installed, you can request this option by clicking the “Office” button and selecting “Excel Options”.
With the Data Analysis ToolPak, it only takes a few clicks to create regression output.
The independent variable goes in the range X.
Let’s say we want to know, given the returns of the S&P 500, if we can gauge the strength and relative nature of Visa’s (V) stock returns. Data on Visa (V) stock returns fill column 1 as the dependent variable. S&P 500 return data fills column 2 as the independent variable.
[Note: If the table seems small, right-click the image and open in new tab for higher resolution.]
Interpret the results
Using this data (the same as in our R-squared article) we get the following table:
The R2 value, also called a coefficient of determination, measures the amount of variation in the dependent variable that is explained by the independent variable, or how well the regression model fits the data. The R2 value ranges from 0 to 1, and a higher value indicates a better fit. The p-value, or probability value, also ranges from 0 to 1 and indicates whether the test is significant. In contrast to the R2 value, a smaller p-value is favorable because it indicates a correlation between the dependent and independent variables.
Draw a regression in Excel
We can plot a regression in Excel by highlighting the data and plotting it as a scatterplot. To add a regression line, choose Layout from the Chart Tools menu. In the dialog box, select Trendline and then Linear Trendline. To add the R2 regression line, choose More Trendline Options from the Trendline menu. Finally, select “Show R-squared value on chart”. The visual result embodies the strength of the relationship, but at the expense of providing as much detail as the table above.