How to create a linear regression model in Excel

What is linear regression?

Linear regression is a plot of data that graphically depicts the linear relationship between an independent variable and a dependent variable. It is typically used to visually show the strength of the relationship and the spread of the results—all for the purpose of explaining the behavior of the dependent variable.

Suppose we wanted to test the strength of the relationship between the amount of ice cream consumed and obesity. We would take the independent variable, amount of ice cream, and relate it to the dependent variable, obesity, to see if there is a relationship. Because a regression is a graphical representation of this relationship, the lower the variability in the data, the stronger the relationship and the tighter the fit to the regression line.

Important points to remember

  • Linear regression models the relationship between one or more dependent and independent variables.
  • A regression analysis can be performed when the variables are independent, there is no heteroscedasticity, and the error terms of the variables are uncorrelated.
  • Modeling linear regression in Excel is easier with the Data Analysis ToolPak.
  How to export contacts from an Excel spreadsheet to Outlook

Important Considerations

There are a few key assumptions about your data set that must be true in order to perform regression analysis:

  • The variables must be truly independent (using a chi-square test).
  • The data should not have different error variances (this is called heteroscedasticity (also spelled heteroscedasticity)).
  • The error terms of each variable should be uncorrelated. Otherwise, it means that the variables are serially correlated.
  • If these three things sound complicated, they are. However, if any of these considerations are incorrect, the estimate is biased. Essentially, you would be distorting the relationship you are measuring.

    Generating a regression in Excel

    The first step to performing a regression analysis in Excel is to verify that the free Excel plugin Data Analysis ToolPak is installed. This plugin makes it very easy to calculate a range of statistics. This is not required to draw a linear regression line, but makes it easier to create statistical tables. To check if it is installed, select “Data” from the toolbar. If “Data Analysis” is an option, the feature is installed and ready to use. If it is not installed, you can request this option by clicking the “Office” button and selecting “Excel Options”.

    With the Data Analysis ToolPak, it only takes a few clicks to create regression output.

      The importance of Excel in business

    The independent variable goes in the range X.

    Let’s say we want to know, given the returns of the S&P 500, if we can gauge the strength and relative nature of Visa’s (V) stock returns. Data on Visa (V) stock returns fill column 1 as the dependent variable. S&P 500 return data fills column 2 as the independent variable.

  • Choose “Data” from the toolbar. The Data menu is displayed.
  • Select “Data Analysis”. The Data Analysis – Analysis Tools dialog box appears.
  • Choose Regression from the menu and click OK.
  • In the Regression dialog box, click the Y Input Range box and select the data for the dependent variable (Visa Stock Returns (V)).
  • Click on the Input X Range box and select the independent variable data (S&P 500 returns).
  • Click OK to start the results.
  • [Note: If the table seems small, right-click the image and open in new tab for higher resolution.]

    Interpret the results

    Using this data (the same as in our R-squared article) we get the following table:

    The R2 value, also called a coefficient of determination, measures the amount of variation in the dependent variable that is explained by the independent variable, or how well the regression model fits the data. The R2 value ranges from 0 to 1, and a higher value indicates a better fit. The p-value, or probability value, also ranges from 0 to 1 and indicates whether the test is significant. In contrast to the R2 value, a smaller p-value is favorable because it indicates a correlation between the dependent and independent variables.

    Draw a regression in Excel

    We can plot a regression in Excel by highlighting the data and plotting it as a scatterplot. To add a regression line, choose Layout from the Chart Tools menu. In the dialog box, select Trendline and then Linear Trendline. To add the R2 regression line, choose More Trendline Options from the Trendline menu. Finally, select “Show R-squared value on chart”. The visual result embodies the strength of the relationship, but at the expense of providing as much detail as the table above.