# 11 Ceteris-paribus Profiles

## 11.1 Introduction

Chapters 7–10 are focused on the methods that quantified the importance of explanatory variables in the context of a single-instance prediction. They result in decomposition of a single prediction into components that could be attributed to particular variables. In this chapter, we focus on a method that analyses the effect of a selected variable in terms of changes of the model’s prediction induced by changes in the variable’s values. The method is based on the *ceteris paribus* principle. *“Ceteris paribus”* is a Latin phrase meaning “other things held constant” or “all else unchanged”. The method examines the influence of an explanatory variable by assuming that the effects of all other variables are unchanged. The main goal is to understand how changes in the values of the variable affect the model’s predictions.

Explanation tools (explainers) presented in this chapter are linked to the second law introduced in Section 1.3, i.e., the law of “Prediction’s speculation”. This is why the tools are also known as *What-if model analysis* or *Individual Conditional Expectations* (Goldstein et al. 2015). It appears that it is easier to understand how a black-box model is working if we can explore the model by investigating the influence of explanatory variables separately, changing one at a time.

## 11.2 Intuition

Ceteris-paribus (CP) profiles show how the model’s prediction would change if the value of a single exploratory variable changed. In essence, a CP profile shows a conditional expectation of the dependent variable (response) for the particular explanatory variable. For example, panel A of Figure 11.1 presents response (prediction) surface for two explanatory variables, *age* and *class*, for the logistic-regression model `titanic_lmr`

(see Section 5.2.1) for the Titanic dataset (see Section 5.1). We are interested in the change of the model’s prediction for passenger Henry (see Section 5.2.5) induced by each of the variables. Toward this end, we may want to explore the curvature of the response surface around a single point with *age* equal to 47 and *class* equal to “1st,” indicated in the plot. CP profiles are one-dimensional profiles that examine the curvature across each dimension, i.e., for each variable. Panel B of Figure 11.1 presents CP profiles for *age* and *class*. Note that, in the CP profile for *age*, the point of interest is indicated by the dot. The plots for both variables suggest that the predicted probability of survival vary considerably for different ages and classes.

## 11.3 Method

In this section, we introduce more formally one-dimensional CP profiles. Recall (see Section 2.3) that we use \(\underline{x}_i\) to refer to the vector of values of explanatory variables corresponding to the \(i\)-th observation in a dataset. A vector with arbitrary values (not linked to any particular observation in the dataset) is denoted by \(\underline{x}_*\). Let \(x^{j}_{*}\) denote the \(j\)-th element of \(\underline{x}_{*}\), i.e., the value of the \(j\)-th explanatory variable. We use \(\underline{x}^{-j}_{*}\) to refer to a vector resulting from removing the \(j\)-th element from \(\underline{x}_{*}\). By \(\underline{x}^{j|=z}_{*}\), we denote a vector resulting from changing the value of the \(j\)-th element of \(\underline{x}_{*}\) to (a scalar) \(z\).

We define a one-dimensional CP profile \(h()\) for model \(f()\), the \(j\)-th explanatory variable, and point of interest \(\underline{x}_*\) as follows:

\[\begin{equation} h^{f,j}_{\underline{x}_*}(z) = f\left(\underline{x}_*^{j|=z}\right). \tag{11.1} \end{equation}\]

CP profile is a function that describes the dependence of the approximated expected value (prediction) of \(Y\) on the value \(z\) of the \(j\)-th explanatory variable. Note that, in practice, \(z\) assumes values from the entire observed range for the variable, while values of all other explanatory variables are kept fixed at the values specified by \(\underline{x}_*\).

Note that in the situation when only a single model is considered, we will skip the model index and we will denote the CP profile for the \(j\)-th explanatory variable and the point of interest \(\underline{x}_*\) by \(h^{j}_{\underline{x}_*}(z)\).

## 11.4 Example: Titanic data

For continuous explanatory variables, a natural way to represent the CP function is to use a profile plot similar to one of those presented in Figure 11.3. In the figure, the dot on the curves marks the instance-prediction of interest, i.e., prediction \(f(\underline{x}_*)\) for a single observation \(\underline{x}_*\). The curve itself shows how the prediction would change if the value of a particular explanatory variable changed.

Figure 11.3 presents CP profiles for the *age* variable in the logistic-regression model `titanic_lmr`

and random-forest model `titanic_rf`

for the Titanic dataset (see Sections 5.2.1 and 5.2.2, respectively). The instance of interest is passenger Henry, a 47-year-old man who travelled in the first class (see Section 5.2.5). It is worth observing that the profile for the logistic-regression model is smooth, while the one for the random-forest model is a step function with some variability. For Henry, the shape of the CP profiles is similar. If Henry were a newborn, with all other values kept unchanged, the prediction of both models would increase about 40 percentage points. And if Henry were 100 years old, the prediction of both models would decrease by more than 10 percentage points.

For a categorical explanatory variable, a natural way to represent the CP function is to use a barplot similar to one of those presented in Figure 11.4. In particular, the figure presents CP profiles for the *class* variable in the logistic-regression and random-forest models for the Titanic dataset (see Sections 5.2.1 and 5.2.2, respectively). For this instance (observation), passenger Henry, the predicted probability for the logistic-regression model would decrease substantially if the value of *class* changed to “2nd” or “3rd”. On the other hand, for the random-forest model, the largest change would be marked if *class* changed to “desk crew”.

Usually, black-box models contain a large number of explanatory variables. However, CP profiles are legible even for tiny subplots, created with techniques like sparklines or small multiples (Tufte 1986). By using the techniques we can display a large number of profiles, while at the same time keeping profiles for consecutive variables in separate panels, as shown in Figure 11.5 for the random-forest model for the Titanic dataset. It helps if the panels are ordered so that the most important profiles are listed first. A method to assess the importance of CP profiles is discussed in the next chapter.

## 11.5 Pros and cons

One-dimensional CP profiles, as presented in this chapter, offer a uniform, easy to communicate, and extendable approach to model exploration. Their graphical representation is easy to understand and explain. It is possible to show profiles for many variables or models in a single plot. CP profiles are easy to compare, as we can overlay profiles for two or more models to better understand differences between the models. We can also compare two or more instances to better understand model-prediction’s stability. CP profiles are also a useful tool for sensitivity analysis.

However, there are several issues related to the use of the CP profiles. One of the most impotant ones is related to the presence of correlated explanatory variables. For such variables, the application of the *ceteris paribus* principle may lead to unrealistic settings and misleading results, as it is not possible to keep one variable fixed while varying the other one. For example, variables like surface and number of rooms, which can be used in prediction of an apartment’s price, are usually correlated. Thus, it is unrealistic to consider very small apartments with a large number of rooms. In fact, in a training dataset, there may be no such combinations. Yet, as implied by (11.1), to compute a CP profile for the number-of-rooms variable for a particular instance of a small-surface apartment, we should consider the model’s predictions \(f\left(\underline{x}_*^{j|=z}\right)\) for all values of \(z\) (i.e., numbers of rooms) observed in the training dataset, including large ones. This means that, especially for flexible models like, for example, regression trees, predictions for a large number of rooms \(z\) may have to be obtained by extrapolating the results obtained for large-surface apartments. Needless to say, such extrapolation may be problematic. We will come back to this issue in Chapters 18 and 19.

A somewhat similar issue is related to the presence of interactions in a model, as they imply the dependence of the effect of one variable on other one(s). Pairwise interactions require the use of two-dimensional CP profiles that are more complex than one-dimensional ones. Needless to say, interactions of higher orders pose even a greater challenge.

A practical issue is that, in case of a model with hundreds or thousands of variables, the number of plots to inspect may be daunting.

Finally, while barplots allow visualization of CP profiles for factors (categorical explanatory variables), their use becomes less trivial in case of factors with many nominal (unordered) categories (like, for example, a ZIP-code).

## 11.6 Code snippets for R

In this section, we present Ceteris paribus profiles as implemented in the `DALEX`

package for R. Note that presented functions are, in fact, wrappers to package `ingredients`

(Biecek et al. 2019) with simplified interface. There are also other R packages that offer similar functionalities, like `condvis`

(O’Connell, Hurley, and Domijan 2017), `pdp`

(Greenwell 2017), `ICEbox`

(Goldstein et al. 2015), `ALEPlot`

(Apley 2018), or `iml`

(Molnar, Bischl, and Casalicchio 2018).

For illustration, we use two classification models developed in Chapter 5.1, namely the logistic-regression model `titanic_lmr`

(Section 5.2.1) and the random-forest model `titanic_rf`

(Section 5.2.2). They are developed to predict the probability of survival after sinking of Titanic. Instance-level explanations are calculated for Henry, a 47-year-old male passenger that travelled in the first class (see Section 5.2.5).

We first retrieve the `titanic_lmr`

and `titanic_rf`

model-objects and the data frame for Henry via the `archivist`

hooks, as listed in Section 5.2.7. We also retrieve the version of the `titanic`

data with imputed missing values.

```
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")
titanic_lmr <- archivist::aread("pbiecek/models/58b24")
titanic_rf <- archivist::aread("pbiecek/models/4e0fc")
(henry <- archivist::aread("pbiecek/models/a6538"))
```

```
class gender age sibsp parch fare embarked
1 1st male 47 0 0 25 Cherbourg
```

Then we construct the explainers for the model by using function `explain()`

from the `DALEX`

package (see Section 5.2.6). We also load the `rms`

and `randomForest`

packages as the models were fitted by using functions from those packages and it is important to have the corresponding `predict()`

functions available.

```
library("DALEX")
library("rms")
explain_lmr <- explain(model = titanic_lmr,
data = titanic_imputed[, -9],
y = titanic_imputed$survived == "yes",
label = "Logistic Regression")
explain_lmr$model_info$type = "classification"
library("randomForest")
explain_rf <- DALEX::explain(model = titanic_rf,
data = titanic_imputed[, -9],
y = titanic_imputed$survived == "yes",
label = "Random Forest")
```

### 11.6.1 Basic use of the `predict_profile()`

function

The easiest way to create and plot CP profiles is to use the `predict_profile()`

function and then apply the generic `plot()`

function to the resulting object. By default, profiles for all explanatory variables are calculated, while profiles for all numeric (continuous) variables are plotted. One can limit the number of variables for which calculations and/or plots are necessary by using the `variables`

argument.

To compute the CP profiles, the `predict_profile()`

function requires arguments `explainer`

, which specifies the name of the explainer-object, and `new_observation`

, which specifies the name of the data frame for the instance for which prediction is of interest. As a result, the function returns an object of class `ceteris_paribus_explainer`

. It is a data frame with the model’s predictions. Below we illustrate the use of the function for the random-forest model.

```
## Top profiles :
## class gender age sibsp parch fare embarked _yhat_ _vname_ _ids_
## 1 1st male 47 0 0 25 Cherbourg 0.246 class 1
## 1.1 2nd male 47 0 0 25 Cherbourg 0.054 class 1
## 1.2 3rd male 47 0 0 25 Cherbourg 0.100 class 1
## 1.3 deck crew male 47 0 0 25 Cherbourg 0.454 class 1
## 1.4 engineering crew male 47 0 0 25 Cherbourg 0.096 class 1
## 1.5 restaurant staff male 47 0 0 25 Cherbourg 0.092 class 1
## _label_
## 1 Random Forest
## 1.1 Random Forest
## 1.2 Random Forest
## 1.3 Random Forest
## 1.4 Random Forest
## 1.5 Random Forest
##
##
## Top observations:
## class gender age sibsp parch fare embarked _yhat_ _label_ _ids_
## 1 1st male 47 0 0 25 Cherbourg 0.246 Random Forest 1
```

To obtain a graphical representation of CP profiles, the generic `plot()`

function can be applied to the data frame returned by the `predict_profile()`

function. It returns a `ggplot2`

object that can be processed further if needed. In the examples below, we use the `ggplot2`

functions, like `ggtitle()`

or `ylim()`

, to modify plot’s title or the range of the y-axis.

Below we show the code that can be used to create plots similar to those presented in the upper part of Figure 11.5. By default, the `plot()`

function provides a graph with plots for all numerical variables. To limit the display to variables *age* and *fare*, the names of the variables are provided in the `variables`

argument. The resulting plot is shown in Figure 11.6.

```
library("ggplot2")
plot(cp_titanic_rf, variables = c("age", "fare")) +
ggtitle("Ceteris-paribus profile", "") + ylim(0, 0.8)
```

To plot CP profiles for categorical variables, we have got to add the `variable_type = "categorical"`

argument to the `plot()`

function. In the code below, we use argument `variables`

to indicate that we want to create plots for *class* and *embarked* variables. The resulting plot is shown in Figure 11.6.

```
plot(cp_titanic_rf, variables = c("class", "embarked"),
variable_type = "categorical", categorical_type = "bars") +
ggtitle("Ceteris-paribus profile", "")
```

### 11.6.2 Advanced use of the `predict_profile()`

function

The `predict_profile()`

function is very flexible. To better understand how can it be used, we briefly review its arguments:

`explainer`

,`data`

,`predict_function`

,`label`

- they provide information about the model. If the object provided in the`explainer`

argument has been created with the`DALEX::explain()`

function, then values of the other arguments are extracted from the object; this is how we use the function in this chapter. Otherwise, we have got to specify directly the model-object, the data frame used for fitting the model, the function that should be used to compute predictions, and the model label.`new_observation`

- a data frame with data for instance(s), for which we want to calculate CP profiles, with the same variables as in the data used to fit the model. Note, however, that it is best not to inlcude the dependent variable in the data frame.`y`

- the observed values of the dependent variable corresponding to`new_observation`

. The use of this argument is illustrated in Section 13.1.`variables`

- names of explanatory variables, for which CP profiles are to be calculated. By default`variables = NULL`

and the profiles are constructed for all variables, which may be time consuming.`variable_splits`

- a list of values for which CP profiles are to be calculated. By default`variable_splits = NULL`

and the list includes all values for categorical variables and uniformly-placed values for continuous variables; for the latter, one can specify the number of the values with the`grid_points`

argument (by default,`grid_points = 101`

).

The code below uses argument `variable_splits`

to specify that CP profiles are to be calculated for *age* and *fare*, together with the list of values at which the profiles are to be evaluated.

```
variable_splits = list(age = seq(0, 70, 0.1),
fare = seq(0, 100, 0.1))
cp_titanic_rf <- individual_profile(explainer = explain_rf,
new_observation = henry,
variable_splits = variable_splits)
```

Susbequently, to replicate the plots presented in the upper part of Figure 11.5, a call to function `plot()`

can be used as below. The resulting plot is shown in Figure 11.6.

```
plot(cp_titanic_rf, variables = c("age", "fare")) +
ylim(0, 1) +
ggtitle("Ceteris-paribus profile", "")
```

In the example below, we present the code to create CP profiles for two passengers, Henry and Johnny D (see Section 5.2.5), for the random-forest model `titanic_rf`

(Section 5.2.2). Toward this end, we first retrieve the `johnny_d`

data frame via the `archivist`

hook, as listed in Section 5.2.7. We then apply the `individual_profile()`

function with the explainer-object `explain_rf`

specified in the `explainer`

argument and the combined data frame for Henry and Johnny D used in the `new_observation`

argument. We apply argument `variable_splits`

to specify that CP profiles are to be calculated for *age* and *fare*, together with the list of values at which the profiles are to be evaluated.

```
## class gender age sibsp parch fare embarked
## 1 1st male 8 0 0 72 Southampton
```

```
cp_titanic_rf2 <- individual_profile(explainer = explain_rf,
new_observation = rbind(henry, johnny_d),
variable_splits = variable_splits)
```

To create the plots of CP profile, we apply the `plot()`

function. We use the `scale_color_manual`

function to add names of passengers to the plot, and to control colors and positions.

```
library(ingredients)
plot(cp_titanic_rf2, color = "_ids_", variables = c("age", "fare")) +
scale_color_manual(name = "Passenger:", breaks = 1:2,
values = c("#4378bf", "#8bdcbe"),
labels = c("henry" , "johny_d")) +
ggtitle("Ceteris-paribus profile", "")
```

The resulting graph, which includes CP profiles for Henry and Johnny D, is presented in Figure 11.9. For Henry, the predicted probability of survival is smaller than for Johnny D, as seen from the location of the large dots on the profiles.

The profiles for *age* indicate a somewhat larger effect of the variable for Henry, as the predicted probability, in general, decreases from about 0.6 to 0.1 with increasing values of the variable. For Johny D, the probability changes from about 0.45 to about 0.05, with a bit less monotonic pattern. For *fare*, the effect is smaller for both passengers, as the probability changes within a smaller range of about 0.2. For Henry, the changes are approximately limited to the interval [0.1, 0.3], while for Johnny D they are limited to the interval [0.4, 0.6].

### 11.6.3 Comparison of models (challenger-champion analysis)

One of the most interesting uses of the CP profiles is the comparison for two or more of models.

To illustrate this possibility, first, we have go to construct profiles for the models. In our illustration, for the sake of clarity, we limit ourselves to the logistic-regression and random-forest models for the Titanic data. Moreover, we use Henry as the instance for which predictions are of interest. We use the `predict_profile()`

function to compute the CP profiles for the two models.

```
cp_titanic_rf <- predict_profile(explain_rf, henry, variable_splits = variable_splits)
cp_titanic_lmr <- predict_profile(explain_lmr, henry, variable_splits = variable_splits)
```

Subsequently, we construct the plot with the help of the `plot()`

function. Note that, for the sake of brevity, we use the `variables`

argument to limit the plot only to profiles for variables *age* and *class*. Every `plot()`

function can take a collection of explainers as arguments. In such case profiles for different models are combined in a single plot. In the code presented below, argument `color = "_label_"`

is used to specify that models are to be color-coded. The `_label_`

refers to the name of the column in the CP explainer that contains the model name.

```
plot(cp_titanic_rf, cp_titanic_lmr, color = "_label_", variables = c("age", "fare")) +
ggtitle("Ceteris-paribus profiles for Henry", "")
```

The result is shown in Figure 11.10. For Henry, the predicted probability of survival is higher for the logistic-regression model than for the random-forest model. CP profiles for *age* show a similar shape, however, and indicate decreasing probability with age. For *fare*, the profile for the logistic-regression model suggests a slight increase of the probabilty, while for the random-forest a decreasing trend can be infered. The difference between the values of the CP profiles for *fare* increases with the increasing values of the variable.
We can only speculate what is the reason for the difference. Perhaps the cause is the correlation between the ticket *fare* and *class.* The logistic regression model handles the dependency of variables differently than the random forest model.

## 11.7 Code snippets for Python

In this section, we use the `dalex`

library for Python. The package covers all methods presented in this chapter. It is available on `pip`

and `GitHub`

.

For illustration purposes, we use the `titanic_rf`

random forest model for the Titanic data developed in Section 5.3.2. Recall that the model is developed to predict the probability of survival for passengers of Titanic. Instance-level explanations are calculated for Henry, a 47-year-old passenger that travelled in the 1st class (see Section 5.3.5).

In the first step we create an Explainer, an object that will provide a uniform interface for the predictive model. We use the `Explainer`

constructor for this purpose.

```
import pandas as pd
henry = pd.DataFrame({'gender': ['male'],
'age': [47],
'class': ['1st'],
'embarked': ['Southampton'],
'fare': [25],
'sibsp': [0],
'parch': [0]},
index = ['Henry'])
import dalex as dx
titanic_rf_exp = dx.Explainer(titanic_rf, X, y, label = "Titanic RF Pipeline")
```

To calculate the CP profile one can use the `predict_profile`

method. The first argument is the observation for which the attributions are to be calculated. The resulting object can be visualised with the `plot`

method. One can specify the vector of `variables`

. By default all continuous variables are plotted.

If you want to plot categorical variables it is advised to additionaly set `variable_type = 'categorical'`

.

### References

Apley, Dan. 2018. *ALEPlot: Accumulated Local Effects (Ale) Plots and Partial Dependence (Pd) Plots*. https://CRAN.R-project.org/package=ALEPlot.

Biecek, Przemyslaw, Hubert Baniecki, Adam Izdebski, and Katarzyna Pekala. 2019. *ingredients: Effects and Importances of Model Ingredients*.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” *Journal of Computational and Graphical Statistics* 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

Greenwell, Brandon M. 2017. “pdp: An R Package for Constructing Partial Dependence Plots.” *The R Journal* 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

Molnar, Christoph, Bernd Bischl, and Giuseppe Casalicchio. 2018. “iml: An R package for Interpretable Machine Learning.” *Joss* 3 (26): 786. https://doi.org/10.21105/joss.00786.

O’Connell, Mark, Catherine Hurley, and Katarina Domijan. 2017. “Conditional Visualization for Statistical Models: An Introduction to the Condvis Package in R.” *Journal of Statistical Software, Articles* 81 (5): 1–20. https://doi.org/10.18637/jss.v081.i05.

Tufte, Edward R. 1986. *The Visual Display of Quantitative Information*. Cheshire, CT, USA: Graphics Press.