Regression in Excel is a way to automate the statistical process of comparing multiple sets of information to see how changes in independent variables affect changes in dependent variables. If you’ve always wanted to find a correlation between two things, regression analysis in Excel is one of the best ways to do it.
The instructions in this article apply to Excel 2019, Excel 2016, Excel 2013, Excel 2010.
What does regression mean?
Regression is a statistical modeling approach that analysts use to determine relationships among multiple variables.
Regression analysis starts with a single variable that you are trying to analyze and independent variables that you are testing to see if they affect that single variable. The analysis examines changes in the independent variables and attempts to correlate those changes with the resulting changes in each (dependent) variable.
It may sound like advanced statistics, but Excel makes this complex analysis available to everyone.
Perform linear regression in Excel
The simplest form of regression analysis is linear regression. Simple linear regression examines the relationship between just two variables.
For example, the table below shows data containing the number of calories a person has burned each day and their weight on that day.
Because this table contains two columns of data, and one variable could potentially affect the other, you can use Excel to perform a regression analysis on this data.
Enable Scan ToolPak add-on
Before you can use Excel’s regression analysis feature, you must enable the Analysis ToolPak add-on in the Excel Options screen.
- In Excel, select the dossier and choose options.
- Choose add-ins in the left navigation menu. Then make sure Excel add-ins is selected in the Manage Master.
- Finally select the Go to Button.
- In the Add-Ins pop-up window. activate Analysis Toolkit by clicking on the box in front of it to put a tick and select OK.
Now that the Analysis ToolPak is activated, you can start performing regression analysis in Excel.
Taking Weight and Calories worksheet as an example, you can perform linear regression analysis in Excel as follows.
- Choose Data Menu. Then in Analyze group, select data analysis.
- In which data analysis choose relapse from the list and click OK.
- That Y input area is the range of cells that contains the dependent variable. In this example it is weight. The website Enter the X range is the range of cells containing the independent variable. In this example, it’s the calorie column.
- Choose labels for header cells, then select New worksheet to send the results to a new worksheet. Choose OK Let Excel do the analysis and send the results to a new sheet.
- Examine the new worksheet. The scan result has a number of values that you need to understand in order to interpret the results.
Each of these numbers has the following meaning:
- Several R: The correlation coefficient. 1 indicates a strong correlation between the two variables, while -1 means there is a strong negative relationship. 0 means no correlation.
- Carre R: The coefficient of determination that indicates how many points lie between the two variables on the regression line. Statistically, it is the sum of the squared deviations from the mean.
- Adjusted R square: A statistical value called R squared that adjusts according to the number of independent variables you choose.
- default error: The accuracy of the regression analysis results. When this error is small, the regression results are more accurate.
- observations: The number of observations in your regression model.
The remaining values in the regression output give you details about the smaller components of the regression analysis.
- df: Statistical value known as degrees of freedom relative to sources of variance.
- ss: sum of squares. The ratio of residual sums of squares to total SS should be smaller if most of your data fit the regression line.
- MRS: Mean square of the regression data.
- f: The F-statistic (F-test) for null hypothesis. It provides the validity of the regression model.
- Importance F: Statistical value known as the P value of F.
If you don’t understand statistics and how regression models are calculated, the values at the end of the summary won’t make much sense. However, the R multiples and R squared are the two most important.
As you can see, calories correlate strongly with total weight in this example.
Multiple linear regression analysis in Excel
To perform the same linear regression but with multiple independent variables, select the entire range (multiple columns and rows) for Enter the X range.
If you select multiple independent variables, you are less likely to find such a strong correlation because there are many variables.
However, regression analysis in Excel can help you find correlations with one or more of these variables that you may not know existed just by looking at the data manually.