# Chapter 24 Ceteris-paribus Two-dimensional Profiles - a Tool for Pairwise Interactions

## 24.1 Introduction

The definition of Ceteris-paribus (CP) profiles, given in Section 6, may be easily extended to two or more explanatory variables. Also, the definition of the variable importance measure \(vip^{CP}_j(x^*)\) have a straightforward extension for a larger number of variables. The extensions are useful to identify or visualize pairwise interactions between explanatory variables.

## 24.2 Intuition

Figure 24.1 presents response (prediction) surface for the `titanic_lmr_v6`

model for two explanatory variables, *age* and *sibsp*, from the *titanic* dataset (see Section 4.1). We are interested in the change of the model prediction induced jointly by the variables.

[TOMASZ: THIS IS A BIT WEAK. WHAT INTUITIVE IS ABOUT THE PLOT? WHAT CAN BE SEEN DIFFERENTLY THAN IN AN 1D CP PROFILE? WHICH STRUCTURE WOULD WE LOOK FOR?]

## 24.3 Method

The definition of one-dimensional CP profiles (see Section 6.3) may be easily extended to two or more explanatory variables. A two-dimensional CP profile for model \(f()\), explanatory variables \(j\) and \(k\), and point \(x^*\) is defined as follows:

\[ CP^{f, (j,k), x^*}(z_1, z_2) \equiv f(x^*|^{(j,k)} = (z_1,z_2)). \]

Thus, a two-dimensional (2D) CP profile is a function that provides the dependence of the instance prediction of the model on the values of \(j\)-th and \(k\)-th explanatory variables \(Z_1\) and \(Z_2\), respectively. The values of \(Z_1\) and \(Z_2\) are taken to go through the range of values typical for the variables. All other explanatory variables are kept fixed at the values given by \(x^*\).

The corresponding variable importance measure is defined as follows: \[ vip^{CP}_{j,k}(x^*) = \int_{\mathcal R}\int_{\mathcal R} |CP^{f,(j,k),x^*}(z_1,z_2) - f(x^*)| g^{j,k}(z_1,z_2)dz_1dz_2=E_{X_j,X_k}[|CP^{f,j,x^*}(X_j,X_k) - f(x^*)|], \] where the expected value is taken over the joint distribution of the \(j\)-th and \(k\)-th explanatory variable.

Such multi-dimensional extensions are useful to check if, for instance, the model involves interactions. In particular, presence of pairwise interactions may be detected with 2D CP profiles.

## 24.4 Example: Titanic data

A natural way to visualize 2D CP profiles is to use a heat map for all pairs of explanatory variables as, in Figure 24.2.

If the number of pairs of explanatory variables is small or moderate, then it is possible to present 2D CP profiles for all pairs of variables.

If the number of pairs is large, we can use the variable importance measure to order the pairs based on their importance and select the most important pairs for purposes of illustration.

[TOMASZ: WE SHOULD INCLUDE HERE A MORE SUBSTANTIVE DISCUSSION REFERRING TO “HENRY”.]

## 24.5 Pros and cons

Two-dimensional CP profiles can be used to identify the presence and the influence of pairwise interactions in a model. However, for models with a large number of explanatory variables, the number of pairs will be large. Consequently, inspection of all possible 2D CP profiles may be challenging. Moreover, the profiles are more difficult to read and interpret than the 1D CP profiles.

[TOMASZ: 2D CP PROFILES FOR FACTORS?]

## 24.6 Code snippets for R

In this section, we present key features of the R package `ingredients`

(Biecek 2019a) which is a part of `DALEXverse`

and covers all methods presented in this chapter. More details and examples can be found at `https://modeloriented.github.io/ingredients/`

.

There are also other R packages that offer similar functionality, like `condvis`

(O’Connell, Hurley, and Domijan 2017) or `ICEbox`

(Goldstein et al. 2015b).

We use the random forest model `titanic_rf_v6`

developed for the Titanic dataset (see Section @ref(model_titanic_rf)) as the example. Recall that we deal with a binary classification problem - we want to predict the probability of survival for a selected passenger.

```
titanic <- archivist::aread("pbiecek/models/27e5c")
titanic_rf_v6 <- archivist::aread("pbiecek/models/31570")
```

First, we have got to create a wrapper around the model (see Section 6.6).

```
library("DALEX")
library("randomForest")
explain_titanic_rf <- explain(model = titanic_rf_v6,
data = titanic[,-9],
y = titanic$survived == "yes",
label = "Random Forest v6")
```

To calculate oscillations we need to first calculate CP profiles for the selected observation. Let us use `henry`

as the instance prediction of interest.

[TOMASZ: WHY NOT USING THE PRE-DEFINED DATA FRAME?]

```
henry <- data.frame(
class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew",
"restaurant staff", "victualling crew")),
gender = factor("male", levels = c("female", "male")),
age = 8,
sibsp = 0,
parch = 0,
fare = 72,
embarked = factor("Southampton", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton"))
)
```

2D profiles are calculated by applyiing the `ceteris_paribus_2d()`

function to the wrapper object. By default, all pairs of continuous explanatory variables are used, but one can limit number of variables for consideration through the `variables`

argument. [TOMASZ: FACTORS?]

```
library("ingredients")
library("ggplot2")
wi_rf_2d <- ceteris_paribus_2d(explain_titanic_rf, observation = henry, variables = c("age", "sibsp", "parch"))
head(wi_rf_2d)
```

As a result, we obtain an object of class `ceteris_paribus_2d_explainer`

with overloaded `print()`

and `plot()`

functions. We can use the latter function to obtain plots of the constructed 2D CP profilest.

[TOMASZ: LABELLING OF THE AXES COULD BE IMPROVED. IT IS UNCLEAR WHICH VARIABLES DEFINE THE Y- AND X AXES. ]

```
plot(wi_rf_2d) +
theme(legend.position = "right", legend.direction = "vertical") + ggtitle("Ceteris Paribus 2D Profiles")
```

The plot suggests that *age* and *sibsp* importantly influence the model response. [TOMASZ: WHY? WHICH FEATURE OF THE PLOTS DISTIGUISHES THIS PAIR FROM THE THREE OTHERS?]

[TOMASZ: WE SHOULD DISCUSS “HENRY” IN THE EXAMPLE SECTION. IN THE SNIPPETS, WE SHOULD SIMPLY SHOW THE UNDERLYING CODE.]

### References

Biecek, Przemyslaw. 2019a. *Ingredients: Effects and Importances of Model Ingredients*. https://ModelOriented.github.io/ingredients/.

O’Connell, Mark, Catherine Hurley, and Katarina Domijan. 2017. “Conditional Visualization for Statistical Models: An Introduction to the Condvis Package in R.” *Journal of Statistical Software, Articles* 81 (5): 1–20. https://doi.org/10.18637/jss.v081.i05.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015b. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” *Journal of Computational and Graphical Statistics* 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.