Chapter 1 Introduction

Machine Learning (ML) models have a wide range of applications in classification or regression problems. Due to the increasing computational power of computers and complexity of data sources, ML models are becoming more and more sophisticated. Models created with the use of techniques such as boosting or bagging of neural networks are parametrized by thousands of coefficients. They are obscure; it is hard to trace the link between input variables and model outcomes - in fact they are treated as black boxes. They are used because of their elasticity and high performance, but their deficiency in interpretability is one of their weakest sides.

In many applications we need to know, understand or prove how the input variables are used in the model. We need to know the impact of particular variables on the final model predictions. Thus we need tools that extract useful information from thousands of model parameters.

DALEX (see Biecek 2018Biecek, Przemyslaw. 2018. DALEX: Descriptive mAchine Learning Explanations. https://pbiecek.github.io/DALEX/.) is an R (R Core Team 2018R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.) library with such tools. DALEX helps to understand the way complex models work. In this document we show two typical use-cases for DALEX: one case will increase our understanding of a model, while the other will increase our understanding of predictions for particular data points.

Figure 1.1. Workflow of a typical machine learning modeling.
A) Modeling is a process in which domain knowledge and data are turned into models.
B) Models are used to generate predictions.
C) Understanding of a model structure may increase our knowledge, and in consequence it may lead to a better model. DALEX helps here.
D) Understanding of drivers behind a particular model’s predictions may help to correct wrong decisions, and in consequence it leads to a better model. DALEX helps here.