Chapter 6 Epilogue

Let’s summarize what has happened in the previous sections.

  • Section 2.2 shows two models with equal performance for apartments dataset.
  • Section 3.1 shows that in general the random forest model has smaller residuals than the linear model but there is a small fraction of very large residuals.
  • Section 4.1 shows that the random forest model under-predicts expensive apartments. It is not a model that we would like to employ.
  • Section 3.2 shows that construction_year is important for the random forest model.
  • Section 3.3 shows that the relation between construction_year and the price of square meter is non linear.

In this section we showed how to improve the basic linear model by feature engineering of construction_year. Findings from the random forest models will help to create a new feature for the linear model.

(#fig:final_model)Distribution of residuals for the new improved linear model

Distribution of residuals for the new improved linear model

In conclusion, the results presented above prove that the apartments_lm_model_improved model is much better than the two initial models introduced in Chapter 3.

In this use-case we showed that explainers implemented in DALEX help to better understand the model and that this knowledge may be used to create a better final model.

Find more examples, vignietts and cheatsheets at DALEX website