Here we will use the wine quality data (https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv) to present the breakDown package for lm models.

First, let’s download the data from URL

Now let’s create a liner model for quality.

The common goodness-of-fit parameteres for lm model are R^2, adjusted R^2, AIC or BIC coefficients.

They assess the overall quality of fit. But how to understand the factors that drive predictions for a single observation?

With the breakDown package!

library(breakDown)
library(ggplot2)

new_observation <- wine[1,]
br <- broken(model, new_observation)
br
#>                            contribution
#> (Intercept)                       5.878
#> residual.sugar = 20.7             1.166
#> density = 1.001                  -1.048
#> alcohol = 8.8                    -0.332
#> pH = 3                           -0.129
#> free.sulfur.dioxide = 45          0.036
#> sulphates = 0.45                 -0.025
#> volatile.acidity = 0.27           0.015
#> fixed.acidity = 7                 0.010
#> total.sulfur.dioxide = 170       -0.009
#> citric.acid = 0.36                0.001
#> chlorides = 0.045                 0.000
#> final_prognosis                   5.563
#> baseline:  0
# different roundings
print(br, digits = 2, rounding_function = signif)
#>                            contribution
#> (Intercept)                     5.90000
#> residual.sugar = 20.7           1.20000
#> density = 1.001                -1.00000
#> alcohol = 8.8                  -0.33000
#> pH = 3                         -0.13000
#> free.sulfur.dioxide = 45        0.03600
#> sulphates = 0.45               -0.02500
#> volatile.acidity = 0.27         0.01500
#> fixed.acidity = 7               0.00950
#> total.sulfur.dioxide = 170     -0.00900
#> citric.acid = 0.36              0.00057
#> chlorides = 0.045               0.00019
#> final_prognosis                 5.60000
#> baseline:  0
print(br, digits = 6, rounding_function = round)
#>                            contribution
#> (Intercept)                    5.877909
#> residual.sugar = 20.7          1.165904
#> density = 1.001               -1.047875
#> alcohol = 8.8                 -0.331669
#> pH = 3                        -0.129216
#> free.sulfur.dioxide = 45       0.036178
#> sulphates = 0.45              -0.025162
#> volatile.acidity = 0.27        0.015355
#> fixed.acidity = 7              0.009514
#> total.sulfur.dioxide = 170    -0.009041
#> citric.acid = 0.36             0.000570
#> chlorides = 0.045              0.000191
#> final_prognosis                5.562658
#> baseline:  0
plot(br) + ggtitle("breakDown plot for predicted quality of a wine")

Use the baseline argument to set the origin of plots.

Works for interactions as well