- #Excel linear regression explained how to
- #Excel linear regression explained manual
- #Excel linear regression explained full
- #Excel linear regression explained code
With the information in this paper, you'll be in a position to avoid that particular LINEST()-well, call it a feature. But the way that the solution is manifested in the results of LINEST() since Excel 2003 is potentially disastrous. ( Multi-collinearity is just a hifalutin word for two or more predictor variables that are perfectly correlated, or virtually so.) Microsoft deserves kudos for recognizing that the problem existed.
#Excel linear regression explained code
Microsoft has also included in the code for LINEST() a method for dealing with severe multi-collinearity in the X matrix. Even if you're using a version subsequent to Excel 2003, the problems still show up in the R 2 values associated with chart trendlines. This is necessary information for anyone needing to migrate a regression analysis from, say, Excel 2002 to Excel 2010, or to understand how Excel 2002's results can be so different from Excel 2010's. You will see in a subsequent paper how Microsoft has changed its algorithm to avoid returning a negative R 2, and how it came about in the first place.
Other definitions of sums of squares can lead to strange results like negative squared multiple correlations." Alas, Microsoft's code developers were not expert in statistical theory, any more than were the other developers Wilkinson was referring to. It is no longer centered about the mean of the dependent variable.
#Excel linear regression explained manual
In 1986, well before LINEST() came along, Lee Wilkinson wrote in the manual for Systat, in its discussion of the MGLH program, "The total sum of squares must be redefined for a regression model with zero intercept. In fairness, I should note that Microsoft was in good company. That option calculates regression statistics "without the constant," also known as "forcing the intercept through zero." While the associated problems have been fixed, anyone who is still using a version of Excel prior to 2003 is in trouble if that option is selected, whether in LINEST(), TREND(), or the Regression tool in the Data Analysis add-in.
#Excel linear regression explained how to
Once you've seen how to replicate the LINEST() results using straightforward matrix algebra, you'll be in a position to see how Microsoft got it badly wrong when it offered LINEST()’s third option, const. A little matrix algebra is needed and it will be necessary for you to be familiar with the concepts behind the worksheet functions MMULT(), MINVERSE(), and TRANSPOSE().
This paper continues with a discussion of how the results provided by LINEST() can be calculated, and how you can replicate those results using Excel's native worksheet functions. The issues that I'm going to discuss in subsequent papers are more serious, particularly if you're still using a version of Excel prior to 2003. Nevertheless, this is principally a matter of convenience.
#Excel linear regression explained full
I recognize that one could use the TREND() function instead of assembling the regression formula, coefficient by coefficient and variable by variable, but there are often times when you need to see the result of modifying one variable or coefficient the only way to do that is to call them out separately in the full equation. With 20 of each, it's tedious and error-prone.Īnd there is absolutely no good reason for it-statistical, theoretical or programmatic. To complete the regression equation, you need to proceed left-to-right for the variables and right-to-left for the coefficients. But with 5, 10, perhaps 20 variables, it becomes exasperating. With just two variables, this is a really minor issue. Instead of the more natural and more easily interpreted: So if you wanted to use the regression equation to estimate the income of the first person in Row 2, you would need to use this formula (parentheses included for clarity only): (The intercept, in cell G5 in Figure 1, always appears rightmost in the LINEST() results.) But in the underlying data set, the Education data (column A) precedes the Age data (column B). The problem is that the regression coefficient for Age is in cell E5, and the coefficient for Education is in cell F5: in left-to-right order, the coefficient for Age comes before the coefficient for Education. Only by setting the third argument to FALSE can you force LINEST() to remove the constant from the regression equation. You can also omit the argument and Excel regards that as setting it to TRUE: LINEST()'s third argument, called const, is set to TRUE in the example just given.