4.2 Prediction breakDown

Does your ML algorithm learn from mistakes? Understanding what causes wrong model predictions will help to improve the model itself.

Lots of arguments in favor of such explainers can be found in the (Ribeiro, Singh, and Guestrin 2016Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In, 1135–44. ACM Press. https://doi.org/10.1145/2939672.2939778.) article. This approach is implemented in the live package (see (Staniak and Biecek 2017Staniak, Mateusz, and Przemyslaw Biecek. 2017. Live: Local Interpretable (Model-Agnostic) Visual Explanations. https://github.com/MI2DataLab/live.)) which may be seen as an extension of the LIME method.

In this section we present other method for explanations of model predictions, namely the one implemented in the breakDown package (Biecek 2017Biecek, Przemyslaw. 2017. BreakDown: BreakDown Plots. https://CRAN.R-project.org/package=breakDown.). The function single_prediction() is a wrapper around this package.

Model prediction is visualized with Break Down Plots, which were inspired by waterfall plots as in xgboostExplainer package. Break Down Plots show the contribution of every variable present in the model.

Function single_prediction() generates variable attributions for selected prediction. The generic plot() function shows these attributions.

##                            contribution
## (Intercept)                       0.000
## + district = Srodmiescie       1042.059
## + surface = 22                  364.385
## + floor = 1                     279.526
## + no.rooms = 2                  279.070
## + construction.year = 2005      -54.566
## final_prognosis                1910.474
## baseline:  3505.971

(#fig:single_prediction_break)Break Down Plot for prediction from the random forest model

Break Down Plot for prediction from the random forest model

Both the plot and the table confirm that all variables (district, surface, floor, no.rooms) have positive effects as expected. Still, these effects are too small while the final prediction - 3505 + 1881- is much smaller than the real price of a square meter 6679. Let’s see how the linear model behaves for this observation.

(#fig:single_prediction_break2)Break Down Plots that compare the linear model and the random forest model

Break Down Plots that compare the linear model and the random forest model

Prediction for linear model is much closer to the real price of square meter for this apartment.