2.1 The explain() function

DALEX is designed to work with various black-box models like tree ensembles, linear models, neural networks etc. Unfortunately R packages that create such models are very inconsistent. Different tools use different interfaces to train, validate and use models. Two most popular frameworks for machine learning are mlr (Bischl et al. 2016Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary M. Jones. 2016. “mlr: Machine Learning in R.” Journal of Machine Learning Research 17 (170):1–5. http://jmlr.org/papers/v17/15-066.html.) and caret (Jed Wing et al. 2016Jed Wing, Max Kuhn. Contributions from, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, et al. 2016. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.). Apart from them, dozens of R packages may be used for modeling.

This is why as the first step DALEX wraps-up the black-box model with meta-data that unifies model interfacing.

Below is a list of arguments required by the explain() function.

explain(model, data, y, predict_function, 
               link, ..., label)
  • model - an R object, a model to be explained. Required by: all explainers.
  • data - data.frame or matrix, a set that will be used for model validation. It should have the same structure as the dataset used for training. Required by: model performance, variable importance. Default: if possible, it should be extracted from the model object.
  • y - a numeric vector with true labels paired with observations in data. Required by: variable importance. Default: no default.
  • predict_function - a function that takes two arguments: model and data, and returns numeric vector with predictions. Predictions should be calculated in the same scale as the y labels. Required by: all explainers. Default: the generic predict() function.
  • link_function - a transformation/link function that is applied to model predictions. Required by: variable effect. Default: the identity I() function.
  • label - a character, a name of the model that will be used in plots. Required by: plots. Default: extracted from the class attribute of the model.

Figure 2.2. The explain() function embeds model, validation data and y labels in a container. Model is accessed via universal interface specified by predict_function() and link_function(). The label field contains a unique name of the model.

The next section introduces use cases of regression. It will help to understand how to use the explain() function and for what purposes. Same functions may be used for binary classification.