R Programming 104- Correlation and Regression ::

Getting the coefficients

ID: 4e16c77d-2e7f-418a-8e03-ce7200ca9d89

coef(model_var)
summary(model_var)
fitted.values(mod_var) # each observation fitted
residuals(mod_var)

\[e + \hat{y} = y \] The sum of residual and fitted value is equal to the original value.

Coefficient of determination

\[R^{2}=1-\frac{S S E}{S S T}=1-\frac{\operatorname{Var}(e)}{\operatorname{Var}(y)}\] This helps to analyze between different models how good the fit is. It compares the model’s utility with respect o null model. \[ y = \hat{y} \] which always outputs the mean of the overall data whatever the input is.

Leverage

\[h_{i}=\frac{1}{n}+\frac{\left(x_{i}-\bar{x}\right)^{2}}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\] It is used to find the contribution of the point in the overall regression. The point of high leverage can change the overall generalization of points.

augment(mod_name)$.hat

The .hat variable obtained from the augment function is the leverage value.

Influence

Also, the cook’s distance variable is available as .cooksd and used to measure the overall influence since leverage may not be sufficient to know how influential a point is.

model_var %>%
augment() %>%
arrange(desc(.cooksd))

Removing outliers

Just don’t remove what you feel can be an outlier but rather justify why this case is not a general one and needs to be removed. Also check how the scope of inference changes due to that point.

General Use Cases

Find points with maximum leverage and minimum cook distance to see which points seem to be influential but are not.

Backlinks

Data Scientist with R