# 12 Ceteris-paribus Oscillations

## 12.1 Introduction

Visual examination of ceteris-paribus (CP) profiles, as illustrated in the previous chapter, is insightful. However, for a model with a large number of explanatory variables, we may end up with a large number of plots that may be overwhelming. In such situation, it might be useful to select the most interesting or important profiles. In this chapter, we describe a measure that can be used for such a purpose and that is directly linked to CP profiles. It can be seen as an instance-level variable-importance measure alternative to the measures discussed in the Chapters 7–10.

## 12.2 Intuition

To assign importance to CP profiles, we can use the concept of profile oscillations. It is worth noting that the larger influence of an explanatory variable on prediction at a particular instance, the larger the fluctuations along with the corresponding CP profile. For a variable that exercises little or no influence on a model’s prediction, the profile will be flat or will barely change. In other words, the values of the CP profile should be close to the value of the model’s prediction for a particular instance. Consequently, the sum of differences between the profile and the value of the prediction, taken across all possible values of the explanatory variable, should be close to zero. The sum can be graphically depicted by the area between the profile and the horizontal line representing the instance prediction. On the other hand, for an explanatory variable with a large influence on the prediction, the area should be large. Figure 12.1 illustrates the concept based on CP profiles introduced in the previous Chapter in Figure 11.5. The larger the highlighted area in Figure 12.1, the more important is the variable for the particular prediction.

## 12.3 Method

Let us formalize this concept now. Denote by \(g^j(z)\) the probability density function of the distribution of the \(j\)-th explanatory variable. The summary measure of the variable’s importance for the model’s prediction at \(\underline{x}_*\), \(vip_{CP}^{j}(\underline{x}_*)\), is defined by using the variable’s CP profile (11.1) as follows:

\[\begin{equation} vip_{CP}^j(\underline{x}_*) = \int_{\mathcal R} |h^{j}_{\underline{x}_*}(z) - f(\underline{x}_*)| g^j(z)dz=E_{X^j}\left\{|h^{j}_{\underline{x}_*}(X^j) - f(\underline{x}_*)|\right\}. \tag{12.1} \end{equation}\]

Thus, \(vip_{CP}^j(\underline{x}_*)\) is the expected absolute deviation of the CP profile from the model’s prediction at \(\underline{x}_*\) over the distribution \(g^j(z)\) for the \(j\)-th explanatory variable.

The true distribution of \(j\)-th explanatory variable is, in most cases, unknown. Thus, there are several options on how to calculate (12.1).

One is to calculate just the area under the CP curve, i.e., to assume that \(g^j(z)\) is a uniform distribution over the range of variable \(X^j\). It follows that a straightforward estimator of \(vip_{CP}^{j}(\underline{x}_*)\) is

\[\begin{equation} \widehat{vip}_{CP}^{j,uni}(\underline{x}_*) = \frac 1k \sum_{l=1}^k |h^{j}_{x_*}(z_l) - f(\underline{x}_*)|, \tag{12.2} \end{equation}\]

where \(z_l\) (\(l=1, \ldots, k\)) are the selected values of the \(j\)-th explanatory variable. For instance, one can select use all unique values of \(X^{j}\) in the considered dataset. Alternatively, for a continuous variable, one can use an equidistant grid of values.

Another approach is to use the empirical distribution of \(X^{j}\). This leads to the estimator defined as follows:

\[\begin{equation} \widehat{vip}_{CP}^{j,emp}(\underline{x}_*) = \frac 1n \sum_{i=1}^n |h^{j}_{\underline{x}_*}(x^{j}_i) - f(\underline{x}_*)|, \tag{12.3} \end{equation}\]

where index \(i\) runs through all observations in a dataset.

The use of \(\widehat{vip}_{CP}^{j,emp}(\underline{x}_*)\) is preferred when there are enough data to accurately estimate the empirical distribution and when the distribution is not uniform. On the other hand, \(\widehat{vip}_{CP}^{j,uni}(\underline{x}_*)\) is in most cases quicker to compute and, therefore, it is preferred if we look for fast approximations.

Note that, the local evaluation of the variables’ importance can be very different from the global evaluation. This is well illustrated by such a simple example. Consider a model \[ f(x^1, x^2) = x^1 * x^2, \] where varibles \(x^1\) and \(x^2\) take values in \([0,1]\). Consider prediction for an observation described by vector \(\underline{x}_* = (0,1)\). In that case, the importance of \(X^1\) is larger than \(X^2\). This is because the CP profile \(h^1_{x_*}(z) = z\) while \(h^2_{x_*}(z) = 0\). Thus we observe oscillations for the first variable but no oscillations for the second. Globally, both variables are equally important because the model is symmetrical, but local behavior is different.

## 12.4 Example: Titanic data

Figure 12.2 shows a barplot for the size of oscillations for explanatory variables for the random-forest model `titanic_rf`

(see Section 5.2.2) for Henry, a 47-year-old man who travelled in the first class (see Section 5.2.5).
In the left panel the measures were computed by using estimator \(\widehat{vip}_{CP}^{j,uni}(\underline{x}_*)\), given in (12.2), by using an equidistant grid of values.
In the right panel the measures were computed by using estimator \(\widehat{vip}_{CP}^{j,emp}(\underline{x}_*)\), given in (12.3), by using an empirical distribution for explanatory variables.
The longer the bar, the larger the CP-profile oscillations for a particular explanatory variable.

Both approaches presented in Figure 12.2 agree that the most important variables for prediction for Henry are *gender* and *age*, followed by *class*.
There is an interesting difference between these approaches regarding the *sibsp* variable. From the 5.3 graph we remember that this variable has a very skewed distribution. A significant mass of this variable is concentrated in zero, but few individuals have a high value of this variable. In such situations, the of empirical density will be very different from a uniform distribution.

Note that, while the variable-importance plot in Figure 12.2 does indicate which explanatory variables are important, it does not describe how the variables influence the prediction. In that respect, the CP profile for *age* for Henry (see Figure 11.5) suggested that, if Henry were older, this would significantly lower his probability of survival. One the other hand, the CP profile for *sibsp* (see Figure 11.5) indicated that, were Henry not travelling alone, this would increase his chances. Thus, the variable-importance plots should always be accompanied by plots of the relevant CP profiles.

## 12.5 Pros and cons

Oscillations of CP profiles are easy to interpret and understand. By using the average of oscillations, it is possible to select the most important variables for an instance prediction. This method can easily be extended to two or more variables. In such cases, one needs to integrate the equation (12.2) over a larger number of variables.

There are several issues related to the use of the CP oscillations, though. For example, the oscillations may not be of help in situations when the use of CP profiles may itself be problematic (e.g., in the case of correlated explanatory variables or interactions – see Section 11.5). An important issue is that the CP-based variable-importance measures (12.1) do not fulfil the local accuracy condition (see Section 9.2), i.e., they do not sum up to the instance prediction for which they are calculated, unlike the break-down attributions (Chapter (7.2)) or Shapley values (Chapter (9.1)).

## 12.6 Code snippets for R

In this section, we present ceteris paribus oscillations as implemented in the `DALEX`

package for R.
For illustration, we use the random-forest model `titanic_rf`

(Section 5.2.2). The model was developed to predict the probability of survival after sinking of Titanic. Instance-level explanations are calculated for Henry, a 47-year-old male passenger that travelled in the first class (see Section 5.2.5).

We first retrieve the `titanic_rf`

model-object and the data frame for Henry via the `archivist`

hooks, as listed in Section 5.2.7. We also retrieve the version of the `titanic`

data with imputed missing values.

```
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")
titanic_rf <- archivist:: aread("pbiecek/models/4e0fc")
(henry <- archivist::aread("pbiecek/models/a6538"))
```

```
class gender age sibsp parch fare embarked
1 1st male 47 0 0 25 Cherbourg
```

Then we construct the explainer for the model by using the function `explain()`

from the `DALEX`

package (see Section 5.2.6). We also load the `randomForest`

package, as the model was fitted by using function `randomForest()`

from this package (see Section 5.2.2) and it is important to have the corresponding `predict()`

function available. The model’s prediction for Henry is obtained with the help of that function.

```
library("randomForest")
library("DALEX")
explain_rf <- DALEX::explain(model = titanic_rf,
data = titanic_imputed[, -9],
y = titanic_imputed$survived == "yes",
label = "Random Forest")
predict(explain_rf, henry)
```

`[1] 0.246`

### 12.6.1 Basic use of the `predict_parts()`

function

To calculate CP-profile oscillations, we will use again the `predict_parts()`

function, but this time with `type="oscillations_uni"`

for the Equation \(\widehat{vip}_{CP}^{j,uni}(\underline{x}_*)\) or `type="oscillations_emp"`

for the Equation \(\widehat{vip}_{CP}^{j,emp}(\underline{x}_*)\). The function was already introduced in Section 7.6.

In particular, we apply it to explainer `explain_rf`

for the random-forest model `titanic_rf`

and the data frame for the instance of interest, i.e., `henry`

. Additionally, we specify the `type="oscillations_uni"`

argument to indicate that we want to compute CP-profile oscillations and the estimated value of the variable-importance measure as defined in Equation (12.2).
Note that, one nesd to specify `type="oscillations_emp"`

to get oscillations estimated with empirical density as defined in Equation (12.2).
By default, oscillations are calculated for all explanatory variables. Use the `variable`

argument to specify subset of variables.

```
oscillations_uniform <- predict_parts(explainer = explain_rf,
new_observation = henry,
type = "oscillations_uni")
oscillations_uniform
```

```
## _vname_ _ids_ oscillations
## 2 gender 1 0.33700000
## 4 sibsp 1 0.16859406
## 3 age 1 0.16744554
## 1 class 1 0.14257143
## 6 fare 1 0.09942574
## 7 embarked 1 0.02400000
## 5 parch 1 0.01031683
```

The resulting object is of class “ceteris_paribus_oscillations”, which is a data frame with three variables `_vname_`

, `_ids_`

, and `oscillations`

that provide, respectively, the name of the variable, the value of the identifier of the instance, and the estimated value of the variable-importance measure. Additionally, the object has also got an overloaded `plot()`

function. We can use the latter function to plot the estimated values of the variable-importance measure for the instance of interest. In the code below, before creating the plot, we make the identifier for Henry more explicit. The resulting graph is shown in Figure 12.3.

```
oscillations_uniform$`_ids_` <- "Henry"
plot(oscillations_uniform) +
ggtitle("Ceteris-paribus Oscillations",
"Expectation over uniform distribution (unique values)")
```

### 12.6.2 Advanced use of the `predict_parts()`

function

As mentioned in the previous section, the `predict_parts()`

function with `type = "oscillations_uni"`

calculates oscilations accoring to the Equation \(\widehat{vip}_{CP}^{j,uni}(\underline{x}_*)\), and with `type = "oscillations_emp"`

accoring to the Equation \(\widehat{vip}_{CP}^{j,emp}(\underline{x}_*)\).

However, other distributions solutions can also be used. For instance, for a continuous explanatory variable, one could apply the same estimator, but using an predefined grid of values. Toward this aim, we can use the `variable_splits`

argument to explicitly specify values for the density estimation. Its application is illustrated in the code below.

```
oscillations_equidist <- predict_parts(explain_rf, henry,
variable_splits = list(age = seq(0, 65, 0.1),
fare = seq(0, 200, 0.1),
gender = unique(titanic_imputed$gender),
class = unique(titanic_imputed$class)),
type = "oscillations")
oscillations_equidist
```

```
## _vname_ _ids_ oscillations
## 3 gender 1 0.3370000
## 1 age 1 0.1677235
## 4 class 1 0.1425714
## 2 fare 1 0.1040790
```

We can use the `plot()`

function to create a barplot of the estimated values. In the code below, before creating the plot, we make the identifier for Henry more explicit. The resulting graph is shown in Figure 12.2.

```
oscillations_equidist$`_ids_` <- "Henry"
plot(oscillations_equidist) +
ggtitle("Ceteris-paribus Oscillations",
"Expectation over specified grid of points")
```