# Chapter 6 Ceteris-paribus Profiles and What-If Analysis

## 6.1 Introduction

*Ceteris paribus* is a Latin phrase meaning “other things held constant” or “all else unchanged.” In this chapter, we introduce a technique for model exploration based on the *Ceteris paribus* principle. In particular, we examine the influence of each explanatory variable, assuming that effects of all other variables are unchanged. The main goal is to understand how changes in a single explanatory variable affects model predictions.

Explanation tools (explainers) presented in this chapter are linked to the second law introduced in Section 1.3, i.e. the law of “Prediction’s speculation.” This is why the tools are also known as *What-If model analysis* or *Individual Conditional Expectations* (Goldstein et al. 2015a). It appears that it is easier to understand how a black-box model is working if we can explore the model by investigating the influence of explanatory variables separately, changing one at a time.

## 6.2 Intuition

Panel A of Figure 6.1 presents response (prediction) surface for the `titanic_lmr_v6`

model for two explanatory variables, *age* and *class*, from the *titanic* dataset (see Section 4.1). We are interested in the change of the model prediction induced by each of the variables. Toward this end, we may want to explore the curvature of the response surface around a single point with *age* equal to 47 and *class* equal to “1st,” indicated in the plot. Ceteris-paribus (CP) profiles are one-dimensional profiles that examine the curvature across each dimension, i.e., for each variable. Panel B of Figure 6.1 presents the profiles corresponding to *age* and *class*. Note that, in the CP profile for *age*, the point of interest is indicated by the black dot. In essence, a CP profile shows a conditional expectation of the dependent variable (response) for the particular explanatory variable.

CP technique is similar to the LIME method (see Chapter 12). LIME and CP profiles examine the curvature of response surface of a model. The difference between these two methods lies in the fact that LIME approximates the black-box model of interest locally with a simpler glass-box model. Usually, the LIME model is sparse, i.e., contains fewer explanatory variables. Thus, one needs to investigate a plot across a smaller number of dimensions. On the other hand, the CP profiles present conditional predictions for every variable and, in most cases, are easier to interpret.

## 6.3 Method

In this section, we introduce more formally one-dimensional CP profiles.

In predictive modeling, we are interested in a distribution of a dependent variable \(Y\) given vector \(x_*\). The latter contains values of explanatory variables. In the ideal world, we would like to know the conditional distribution of \(Y\) given \(x_*\). In practical applications, however, we usually do not predict the entire distribution, but just some of its characteristics like the expected (mean) value, a quantile, or variance. Without loss of generality we will assume that we model the conditional expected value \(E_Y(Y | x_*)\).

Assume that we have got model \(f()\), for which \(f(x_*)\) is an approximation of \(E_Y(Y | x_*)\), i.e., \(E_Y(Y | x_*) \approx f(x_*)\). Note that we do not assume that it is a “good” model, nor that the approximation is precise. We simply assume that we have got a model that is used to estimate the conditional expected value and to form predictions of the values of the dependent variable. Our interest lies in the evaluation of the quality of the predictions. If the model offers a “good” approximation of the conditional expected value, it should be reflected in its satisfactory predictive performance.

Recall (see Section 1.8) that we use \(x_i\) to refer to the vector corresponding to the \(i\)-th observation in a dataset. Let \(x^{j}_{*}\) denote the \(j\)-th element of \(x_{*}\), i.e., the \(j\)-th explanatory variable. We use \(x^{-j}_{*}\) to refer to a vector resulting from removing the \(j\)-th element from \(x_{*}\). By \(x^{j|=z}_{*}\), we denote a vector resulting from changing the value of the \(j\)-th element of \(x_{*}\) to (a scalar) \(z\).

We define a one-dimensional CP profile \(h()\) for model \(f()\), the \(j\)-th explanatory variable, and point \(x_*\) as follows:

\[ h^{f,j}_{x_*}(z) \equiv f(x_*^{j|=z}). \] CP profile is a function that provides the dependence of the approximated expected value (prediction) of \(Y\) on the value \(z\) of the \(j\)-th explanatory variable. Note that, in practice, \(z\) is taken to go through the entire range of values typical for the variable, while values of all other explanatory variables are kept fixed at the values specified by \(x_*\).

Note that in the situation when only a single model is considered, we will skip the model index and we will denote the CP profile for the \(j\)-th explanatory variable and the point of interest \(x_*\) by \(h^{j}_{x_*}(z)\).

## 6.4 Example: Titanic

For continuous explanatory variables, a natural way to represent the CP function is to use a profile plot similar to the ones presented in Figure 6.2. In the figure, the dot on the curves marks an instance prediction, i.e., prediction \(f(x_*)\) for a single observation \(x_*\). The curve itself shows how the prediction would change if the value of a particular explanatory variable changed.

Figure 6.2 presents CP profiles for the *age* variable in the logistic regression and random forest models for the Titanic dataset (see Sections 4.1.2 and 4.1.3, respectively). It is worth observing that the profile for the logistic regression model is smooth, while the one for the random forest model shows more variability. For this instance (observation), the prediction for the logistic regression model would increase substantially if the value of *age* became lower than 20. For the random forrest model, a substantial increase would be obtained if *age* became lower than 13 or so.

For a categorical explanatory variable, a natural way to represent the CP function is to use a barplot similar to the ones presented in Figure 6.3. The barplots in Figure 6.2 present CP profiles for the *class* variable in the logistic regression and random forest models for the Titanic dataset (see Sections 4.1.2 and 4.1.3, respectively). For this instance (observation), the predicted probability for the logistic regression model would decrease substantially if the value of *class* changed to “2nd”. On the other hand, for the random forest model, the largest change would be marked if *class* changed to “restaurant staff”.

Usually, black-box models contain a large number of explanatory variables. However, CP profiles are legible even for tiny subplots, created with techniques like sparklines or small multiples (Tufte 1986). In this way we can display a large number of profiles at the same time keeping profiles for consecutive variables in separate panels, as shown in Figure 6.4 for the random forest model for the Titanic dataset. It helps if these panels are ordered so that the most important profiles are listed first. We discuss a method to assess the importance of CP profiles in the next chapter.

## 6.5 Pros and cons

One-dimensional CP profiles, as presented in this chapter, offer a uniform, easy to communicate and extendable approach to model exploration. Their graphical representation is easy to understand and explain. It is possible to show profiles for many variables or models in a single plot. CP profiles are easy to compare, thus we can juxtapose two or more models to better understand differences between models. We can also compare two or more instances to better understand model stability. CP profiles are also a useful tool for sensitivity analysis.

There are several issues related to the use of the CP profiles. If explanatory variables are correlated, then changing one variable implies a change in the other. In such case, the application of the *Ceteris paribus* principle may lead to unrealistic settings, as it is not possible to keep one variable fixed while varying the other one. For example, apartment’s price prediction features like surface and number of rooms are correlated thus it is unrealistic to consider very small apartments with extreme number of rooms. Special cases are interactions, which require the use of two-dimensional CP profiles that are more complex than one-dimensional ones. Also, in case of a model with hundreds or thousands of variables, the number of plots to inspect may be daunting. Finally, while barplots allow visualization of CP profiles for factors (categorical explanatory variables), their use becomes less trivial in case of factors with many nominal (unordered) categories (like, for example, a ZIP-code).

## 6.6 Code snippets for R

In this section, we present key features of the R package `ingredients`

(Biecek 2019a) which is a part of `DrWhy.AI`

universe and covers all methods presented in this chapter. More details and examples can be found at https://modeloriented.github.io/ingredients/.

Note that there are also other R packages that offer similar functionality, like `condvis`

(O’Connell, Hurley, and Domijan 2017), `pdp`

(Greenwell 2017a), `ICEbox`

(Goldstein et al. 2015b), `ALEPlot`

(Apley 2018a), `iml`

(Molnar, Bischl, and Casalicchio 2018a).

For illustration, we use two classification models developed in Chapter 4.1, namely the logistic regression model `titanic_lmr_v6`

(Section 4.1.2) and the random forest model `titanic_rf_v6`

(Section 4.1.3). They are developed to predict the probability of survival after sinking of Titanic. Instance-level explanations are calculated for a single observation `henry`

- a 47 years old male passenger that travelled in the 1st class.

`DALEX`

explainers for both models and the `henry`

data frame are retrieved via the `archivist`

hooks as listed in Section 4.1.7.

```
library("rms")
explain_lmr_v6 <- archivist::aread("pbiecek/models/2b9b6")
library("randomForest")
explain_rf_v6 <- archivist::aread("pbiecek/models/9b971")
library("DALEX")
henry <- archivist::aread("pbiecek/models/a6538")
henry
```

```
## class gender age sibsp parch fare embarked
## 1 1st male 47 0 0 25 Cherbourg
```

### 6.6.1 Basic use of the `ceteris_paribus`

function

The easiest way to create and plot CP profiles is to call `ceteris_paribus()`

function and then the generic `plot()`

function. By default, profiles for all variables are being calculated and all numeric features are being plotted. One can limit the number of variables that should be considered with the `variables`

argument.

To obtain CP profiles, the `ceteris_paribus()`

function requires the explainer-object and the instance data frame as arguments. As a result, the function yields an object od the class `ceteris_paribus_explainer`

. It is a data frame with model predictions.

```
## Top profiles :
## class gender age sibsp parch fare embarked _yhat_ _vname_
## 1 3rd male 47 0 0 25 Cherbourg 0.100 class
## 1.1 2nd male 47 0 0 25 Cherbourg 0.054 class
## 1.2 1st male 47 0 0 25 Cherbourg 0.246 class
## 1.3 engineering crew male 47 0 0 25 Cherbourg 0.096 class
## 1.4 victualling crew male 47 0 0 25 Cherbourg 0.098 class
## 1.5 restaurant staff male 47 0 0 25 Cherbourg 0.092 class
## _ids_ _label_
## 1 1 Random Forest v6
## 1.1 1 Random Forest v6
## 1.2 1 Random Forest v6
## 1.3 1 Random Forest v6
## 1.4 1 Random Forest v6
## 1.5 1 Random Forest v6
##
##
## Top observations:
## class gender age sibsp parch fare embarked _yhat_ _label_
## 1 1st male 47 0 0 25 Cherbourg 0.246 Random Forest v6
## _ids_
## 1 1
```

To obtain a graphical representation of CP profiles, the generic `plot()`

function can be applied to the data frame returned by the `ceteris_paribus()`

function. It returns a `ggplot2`

object that can be processed further if needed. In the examples below, we use the `ggplot2`

functions, like `ggtitle()`

or `ylim()`

, to modify plot’s title or the range of the Y-axis.

The resulting plot can be enriched with additional data by applying functions `ingredients::show_rugs()`

(adds rugs for the selected points), `ingredients::show_observations`

(adds dots that shows observations), or `ingredients::show_aggreagated_profiles`

. All these functions can take additional arguments to modify size, color, or linetype.

Below we show an R snippet that can be used to replicate plots presented in the upper part of Figure 6.4.

```
library("ggplot2")
plot(cp_titanic_rf, variables = c("age", "fare")) +
show_observations(cp_titanic_rf, variables = c("age", "fare")) +
ggtitle("Ceteris Paribus Profiles", "For the random forest model and the Titanic dataset")
```

By default, all numerical variables are plotted.
To plot CP profiles for categorical variables, we have got to add the `only_numerical = FALSE`

argument to the `plot()`

function. The code below an be used to recreate the right-hand-side plot from Figure 6.3.

```
plot(cp_titanic_rf, variables = c("class", "embarked"), only_numerical = FALSE) +
ggtitle("Ceteris Paribus Profiles", "For the random forest model and the Titanic dataset")
```

### 6.6.2 Advanced use of the `ceteris_paribus`

function

The `ceteris_paribus()`

is a very flexible function. To better understand how it can be used, we briefly review its arguments.

`x`

,`data`

,`predict_function`

,`label`

- information about a model. If`x`

is created with the`DALEX::explain`

function, then other arguments are extracted from`x`

; this is how we use the function in this chapter. Otherwise, we have got to specify directly the model, the validation data, the predict function, and the model label.`new_observation`

- instance (one or more), for which we want to calculate CP profiles. It should be a data frame with same variables as in the validation data.`y`

- observed value of the dependent variable for`new_observation`

. The use of this argument is illustrated in Section 8.1.`variables`

- names of explanatory variables, for which CP profiles are to be calculated. By default, the profiles will be constructed for all variables, which may be time consuming.`variable_splits`

- a list of values for which CP profiles are to be calculated. By default, these are all values for categorical variables. For continuous variables, uniformly-placed values are selected; one can specify the number of the values with the`grid_points`

argument (the default is 101).

The code below allows to obtain the plots in the upper part of Figure 6.4. The argument `variable_splits`

specifies the variables (`age`

and `fare`

) for which CP profiles are to be calculated, together with the list of values at which the profiles are to be evaluated.

```
cp_titanic_rf <- ceteris_paribus(explain_rf_v6, henry,
variable_splits = list(age = seq(0, 70, 0.1),
fare = seq(0, 100, 0.1)))
```

```
plot(cp_titanic_rf) +
show_observations(cp_titanic_rf, variables = c("age", "fare"), size = 5) +
ylim(0, 1) +
ggtitle("Ceteris Paribus Profiles", "For the random forest model and titanic dataset")
```

To enhance the plot, additional functions can be used. The generic `plot()`

function creates a `ggplot2`

object with a single `geom_line`

layer. Function `show_observations`

adds `geom_point`

layer, `show_rugs`

adds `geom_rugs`

, while `show_profiles`

adds another `geom_line`

. All these functions take, as the first argument, an object created with the `ceteris_paribus`

function. They can be combined freely to superpose profiles for different models or observations.

In the example below, we present the code to create XO profiles for two passengers, `henry`

and `johny_d`

. Their profiles are included in a plot presented in Figure 6.8. We use the `scale_color_manual`

function to add names of passengers to the plot, and to control colors and positions.

```
johny_d <- archivist::aread("pbiecek/models/e3596")
cp_titanic_rf2 <- ceteris_paribus(explain_rf_v6, rbind(henry, johny_d))
```

```
plot(cp_titanic_rf2, color = "_ids_") +
show_observations(cp_titanic_rf2, size = 5, variables = c("age", "fare")) +
show_rugs(cp_titanic_rf2, sides = "bl", variables = c("age", "fare")) +
scale_color_manual(name = "Passenger:", breaks = 1:2, values = c("#4378bf", "#8bdcbe"), labels = c("henry" , "johny_d")) +
ggtitle("Ceteris Paribus Profiles", "For the random forest model and the Titanic dataset")
```

### 6.6.3 Champion-challenger analysis

One of the most interesting uses of the explainers is comparison of CP profiles for two or more of models.

To illustrate this possibility, first, we have go to construct profiles for the models. In our illustration, for the sake of clarity, we limit ourselves just to two models: the logistic regression and random forest models for the Titanic data. Moreover, we only consider the `age`

and `fare`

variables. We use `henry`

as the instance, for which predictions are of interest.

```
cp_titanic_rf <- ceteris_paribus(explain_rf_v6, henry)
cp_titanic_lmr <- ceteris_paribus(explain_lmr_v6, henry)
```

Subsequently, we construct the plot. The result is shown in Figure 6.9. Predictions for `henry`

are slightly different, logistic regression returns in this case higher predictions then random forest. For `age`

variable profiles of both models are similar, in both models we see decreasing dependency. While for `fare`

the logistic regression model is slightly positive while random forest is negative. The larger the `fare`

the larger is difference between these models. Such analysis helps us to which degree different models agree on what if scenarios.

Note that every `plot`

and `show_*`

function can take a collection of explainers as arguments. Profiles for different models are included in a single plot. In the presented R snippet, models are color-coded with the help of the argument `color = "_label_"`

, where `_label_`

refers to the name of the column in the CP explainer that contains the model label.

```
plot(cp_titanic_rf, cp_titanic_lmr, color = "_label_") +
show_observations(cp_titanic_rf, cp_titanic_lmr, color = "black", variables = c("age", "fare"), size = 5) +
scale_color_discrete(name = "Selected models:") + ylim(0,1) +
ggtitle("Ceteris Paribus Profiles for Henry")
```

### References

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015a. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” *Journal of Computational and Graphical Statistics* 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

Tufte, Edward R. 1986. *The Visual Display of Quantitative Information*. Cheshire, CT, USA: Graphics Press.

Biecek, Przemyslaw. 2019a. *Ingredients: Effects and Importances of Model Ingredients*. https://ModelOriented.github.io/ingredients/.

O’Connell, Mark, Catherine Hurley, and Katarina Domijan. 2017. “Conditional Visualization for Statistical Models: An Introduction to the Condvis Package in R.” *Journal of Statistical Software, Articles* 81 (5): 1–20. https://doi.org/10.18637/jss.v081.i05.

Greenwell, Brandon M. 2017a. “Pdp: An R Package for Constructing Partial Dependence Plots.” *The R Journal* 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015b. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” *Journal of Computational and Graphical Statistics* 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

Apley, Dan. 2018a. *ALEPlot: Accumulated Local Effects (Ale) Plots and Partial Dependence (Pd) Plots*. https://CRAN.R-project.org/package=ALEPlot.

Molnar, Christoph, Bernd Bischl, and Giuseppe Casalicchio. 2018a. “Iml: An R Package for Interpretable Machine Learning.” *JOSS* 3 (26). Journal of Open Source Software: 786. https://doi.org/10.21105/joss.00786.