2 minutes
R Programming 104- Correlation and Regression
Getting the coefficients
ID: 4e16c77d-2e7f-418a-8e03-ce7200ca9d89
coef(model_var)
summary(model_var)
fitted.values(mod_var) # each observation fitted
residuals(mod_var)
\[e + \hat{y} = y \] The sum of residual and fitted value is equal to the original value.
Coefficient of determination
\[R^{2}=1-\frac{S S E}{S S T}=1-\frac{\operatorname{Var}(e)}{\operatorname{Var}(y)}\] This helps to analyze between different models how good the fit is. It compares the model’s utility with respect o null model. \[ y = \hat{y} \] which always outputs the mean of the overall data whatever the input is.
Leverage
\[h_{i}=\frac{1}{n}+\frac{\left(x_{i}-\bar{x}\right)^{2}}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\] It is used to find the contribution of the point in the overall regression. The point of high leverage can change the overall generalization of points.
augment(mod_name)$.hat
The .hat variable obtained from the augment function is the leverage value.
Influence
Also, the cook’s distance variable is available as .cooksd and used to measure the overall influence since leverage may not be sufficient to know how influential a point is.
model_var %>%
augment() %>%
arrange(desc(.cooksd))
Removing outliers
Just don’t remove what you feel can be an outlier but rather justify why this case is not a general one and needs to be removed. Also check how the scope of inference changes due to that point.
General Use Cases
- Find points with maximum leverage and minimum cook distance to see which points seem to be influential but are not.
Backlinks
224 Words
2020-10-06 00:00 +0545