- 1 Introduction
- 2 Architecture of DALEX
- 3 Model understanding
- 4 Prediction understanding
- 5 Ceteris Paribus Profiles
- 5.1 Ceteris Paribus profiles for a single observation
- 5.2 Exploration of local structure with Ceteris Paribus profiles
- 5.3 Exploration of global structure with Ceteris Paribus profiles
- 5.4 What-If scenarios: Single Observation and Multiple Models
- 5.5 Exploration of multiclass classification models
- 5.6 Global Structure and Multiple Models

- 6 Epilogue
- 7 Exercises

`explain()`

functionDALEX is designed to work with various black-box models like tree ensembles, linear models, neural networks etc. Unfortunately R packages that create such models are very inconsistent. Different tools use different interfaces to train, validate and use models. Two most popular frameworks for machine learning are `mlr`

(Bischl et al. 2016Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary M. Jones. 2016. “mlr: Machine Learning in R.” *Journal of Machine Learning Research* 17 (170):1–5. http://jmlr.org/papers/v17/15-066.html.) and `caret`

(Jed Wing et al. 2016Jed Wing, Max Kuhn. Contributions from, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, et al. 2016. *Caret: Classification and Regression Training*. https://CRAN.R-project.org/package=caret.). Apart from them, dozens of R packages may be used for modeling.

This is why as the first step DALEX wraps-up the black-box model with meta-data that unifies model interfacing.

Below is a list of arguments required by the `explain()`

function.

```
explain(model, data, y, predict_function,
link, ..., label)
```

`model`

- an R object, a model to be explained.*Required by*: all explainers.`data`

-`data.frame`

or`matrix`

, a set that will be used for model validation. It should have the same structure as the dataset used for training.*Required by*: model performance, variable importance.*Default*: if possible, it should be extracted from the`model`

object.`y`

- a numeric vector with true labels paired with observations in`data`

.*Required by*: variable importance.*Default*: no default.`predict_function`

- a function that takes two arguments: model and data, and returns numeric vector with predictions. Predictions should be calculated in the same scale as the`y`

labels.*Required by*: all explainers.*Default*: the generic`predict()`

function.`link_function`

- a transformation/link function that is applied to model predictions.*Required by*: variable effect.*Default*: the identity`I()`

function.`label`

- a character, a name of the model that will be used in plots.*Required by*: plots.*Default*: extracted from the`class`

attribute of the`model`

.

Figure 2.2. The `explain()`

function embeds `model`

, validation `data`

and `y`

labels in a container. Model is accessed via universal interface specified by `predict_function()`

and `link_function()`

. The `label`

field contains a unique name of the model.

The next section introduces use cases of regression. It will help to understand how to use the `explain()`

function and for what purposes. Same functions may be used for binary classification.