Chapter 6 Ceteris-paribus Profiles and What-If Analysis

6.1 Introduction

Ceteris paribus is a Latin phrase meaning “other things held constant” or “all else unchanged.” In this chapter, we introduce a technique for model exploration based on the Ceteris paribus principle. In particular, we examine the influence of each explanatory variable, assuming that effects of all other variables are unchanged. The main goal is to understand how changes in a single explanatory variable affects model predictions.

Explanation tools (explainers) presented in this chapter are linked to the second law introduced in Section 1.3, i.e. the law of “Prediction’s speculation.” This is why the tools are also known as What-If model analysis or Individual Conditional Expectations (Goldstein et al. 2015a). It appears that it is easier to understand how a black-box model is working if we can explore the model by investigating the influence of explanatory variables separately, changing one at a time.

6.2 Intuition

Panel A of Figure 6.1 presents response (prediction) surface for the titanic_lmr_v6 model for two explanatory variables, age and class, from the titanic dataset (see Section 4.1). We are interested in the change of the model prediction induced by each of the variables. Toward this end, we may want to explore the curvature of the response surface around a single point with age equal to 47 and class equal to “1st,” indicated in the plot. Ceteris-paribus (CP) profiles are one-dimensional profiles that examine the curvature across each dimension, i.e., for each variable. Panel B of Figure 6.1 presents the profiles corresponding to age and class. Note that, in the CP profile for age, the point of interest is indicated by the black dot. In essence, a CP profile shows a conditional expectation of the dependent variable (response) for the particular explanatory variable.

(fig:modelResponseCurveLine) A) Model response (prediction) surface. Ceteris-paribus (CP) profiles marked with black curves help to understand the curvature of the surface while changing only a single explanatory variable. B) CP profiles for individual variables, age (continuous) and class (categorical).

Figure 6.1: (fig:modelResponseCurveLine) A) Model response (prediction) surface. Ceteris-paribus (CP) profiles marked with black curves help to understand the curvature of the surface while changing only a single explanatory variable. B) CP profiles for individual variables, age (continuous) and class (categorical).

CP technique is similar to the LIME method (see Chapter 12). LIME and CP profiles examine the curvature of response surface of a model. The difference between these two methods lies in the fact that LIME approximates the black-box model of interest locally with a simpler glass-box model. Usually, the LIME model is sparse, i.e., contains fewer explanatory variables. Thus, one needs to investigate a plot across a smaller number of dimensions. On the other hand, the CP profiles present conditional predictions for every variable and, in most cases, are easier to interpret.

6.3 Method

In this section, we introduce more formally one-dimensional CP profiles.

In predictive modeling, we are interested in a distribution of a dependent variable \(Y\) given vector \(x_*\). The latter contains values of explanatory variables. In the ideal world, we would like to know the conditional distribution of \(Y\) given \(x_*\). In practical applications, however, we usually do not predict the entire distribution, but just some of its characteristics like the expected (mean) value, a quantile, or variance. Without loss of generality we will assume that we model the conditional expected value \(E_Y(Y | x_*)\).

Assume that we have got model \(f()\), for which \(f(x_*)\) is an approximation of \(E_Y(Y | x_*)\), i.e., \(E_Y(Y | x_*) \approx f(x_*)\). Note that we do not assume that it is a “good” model, nor that the approximation is precise. We simply assume that we have got a model that is used to estimate the conditional expected value and to form predictions of the values of the dependent variable. Our interest lies in the evaluation of the quality of the predictions. If the model offers a “good” approximation of the conditional expected value, it should be reflected in its satisfactory predictive performance.

Recall (see Section 1.8) that we use \(x_i\) to refer to the vector corresponding to the \(i\)-th observation in a dataset. Let \(x^{j}_{*}\) denote the \(j\)-th element of \(x_{*}\), i.e., the \(j\)-th explanatory variable. We use \(x^{-j}_{*}\) to refer to a vector resulting from removing the \(j\)-th element from \(x_{*}\). By \(x^{j|=z}_{*}\), we denote a vector resulting from changing the value of the \(j\)-th element of \(x_{*}\) to (a scalar) \(z\).

We define a one-dimensional CP profile \(h()\) for model \(f()\), the \(j\)-th explanatory variable, and point \(x_*\) as follows:

\[ h^{f,j}_{x_*}(z) \equiv f(x_*^{j|=z}). \] CP profile is a function that provides the dependence of the approximated expected value (prediction) of \(Y\) on the value \(z\) of the \(j\)-th explanatory variable. Note that, in practice, \(z\) is taken to go through the entire range of values typical for the variable, while values of all other explanatory variables are kept fixed at the values specified by \(x_*\).

Note that in the situation when only a single model is considered, we will skip the model index and we will denote the CP profile for the \(j\)-th explanatory variable and the point of interest \(x_*\) by \(h^{j}_{x_*}(z)\).

6.4 Example: Titanic

For continuous explanatory variables, a natural way to represent the CP function is to use a profile plot similar to the ones presented in Figure 6.2. In the figure, the dot on the curves marks an instance prediction, i.e., prediction \(f(x_*)\) for a single observation \(x_*\). The curve itself shows how the prediction would change if the value of a particular explanatory variable changed.

Figure 6.2 presents CP profiles for the age variable in the logistic regression and random forest models for the Titanic dataset (see Sections 4.1.2 and 4.1.3, respectively). It is worth observing that the profile for the logistic regression model is smooth, while the one for the random forest model shows more variability. For this instance (observation), the prediction for the logistic regression model would increase substantially if the value of age became lower than 20. For the random forrest model, a substantial increase would be obtained if age became lower than 13 or so.

(fig:profileAgeRf) Ceteris-paribus profiles for variable `age` for the logistic regression (`titanic_lmr_v6`) and random forest (`titanic_rf_v6` ) models that predict the probability of surviving based on the Titanic data

Figure 6.2: (fig:profileAgeRf) Ceteris-paribus profiles for variable age for the logistic regression (titanic_lmr_v6) and random forest (titanic_rf_v6 ) models that predict the probability of surviving based on the Titanic data

For a categorical explanatory variable, a natural way to represent the CP function is to use a barplot similar to the ones presented in Figure 6.3. The barplots in Figure 6.2 present CP profiles for the class variable in the logistic regression and random forest models for the Titanic dataset (see Sections 4.1.2 and 4.1.3, respectively). For this instance (observation), the predicted probability for the logistic regression model would decrease substantially if the value of class changed to “2nd”. On the other hand, for the random forest model, the largest change would be marked if class changed to “restaurant staff”.

(fig:profileAgeRf2) Ceteris-paribus profiles for variable `class` for the logistic regression (`titanic_lmr_v6`) and random forest (`titanic_rf_v6` ) models that predict the probability of surviving based on the Titanic data

Figure 6.3: (fig:profileAgeRf2) Ceteris-paribus profiles for variable class for the logistic regression (titanic_lmr_v6) and random forest (titanic_rf_v6 ) models that predict the probability of surviving based on the Titanic data

Usually, black-box models contain a large number of explanatory variables. However, CP profiles are legible even for tiny subplots, created with techniques like sparklines or small multiples (Tufte 1986). In this way we can display a large number of profiles at the same time keeping profiles for consecutive variables in separate panels, as shown in Figure 6.4 for the random forest model for the Titanic dataset. It helps if these panels are ordered so that the most important profiles are listed first. We discuss a method to assess the importance of CP profiles in the next chapter.

(fig:profileV4Rf) Ceteris-paribus profiles for all continuous explanatory variables for the random forest (`titanic_rf_v6`) model for the `titanic` dataset

Figure 6.4: (fig:profileV4Rf) Ceteris-paribus profiles for all continuous explanatory variables for the random forest (titanic_rf_v6) model for the titanic dataset

6.5 Pros and cons

One-dimensional CP profiles, as presented in this chapter, offer a uniform, easy to communicate and extendable approach to model exploration. Their graphical representation is easy to understand and explain. It is possible to show profiles for many variables or models in a single plot. CP profiles are easy to compare, thus we can juxtapose two or more models to better understand differences between models. We can also compare two or more instances to better understand model stability. CP profiles are also a useful tool for sensitivity analysis.

There are several issues related to the use of the CP profiles. If explanatory variables are correlated, then changing one variable implies a change in the other. In such case, the application of the Ceteris paribus principle may lead to unrealistic settings, as it is not possible to keep one variable fixed while varying the other one. For example, apartment’s price prediction features like surface and number of rooms are correlated thus it is unrealistic to consider very small apartments with extreme number of rooms. Special cases are interactions, which require the use of two-dimensional CP profiles that are more complex than one-dimensional ones. Also, in case of a model with hundreds or thousands of variables, the number of plots to inspect may be daunting. Finally, while barplots allow visualization of CP profiles for factors (categorical explanatory variables), their use becomes less trivial in case of factors with many nominal (unordered) categories (like, for example, a ZIP-code).

6.6 Code snippets for R

In this section, we present key features of the R package ingredients (Biecek 2019a) which is a part of DrWhy.AI universe and covers all methods presented in this chapter. More details and examples can be found at https://modeloriented.github.io/ingredients/.

Note that there are also other R packages that offer similar functionality, like condvis (O’Connell, Hurley, and Domijan 2017), pdp (Greenwell 2017a), ICEbox (Goldstein et al. 2015b), ALEPlot (Apley 2018a), iml (Molnar, Bischl, and Casalicchio 2018a).

For illustration, we use two classification models developed in Chapter 4.1, namely the logistic regression model titanic_lmr_v6 (Section 4.1.2) and the random forest model titanic_rf_v6 (Section 4.1.3). They are developed to predict the probability of survival after sinking of Titanic. Instance-level explanations are calculated for a single observation henry - a 47 years old male passenger that travelled in the 1st class.

DALEX explainers for both models and the henry data frame are retrieved via the archivist hooks as listed in Section 4.1.7.

##   class gender age sibsp parch fare  embarked
## 1   1st   male  47     0     0   25 Cherbourg

6.6.1 Basic use of the ceteris_paribus function

The easiest way to create and plot CP profiles is to call ceteris_paribus() function and then the generic plot() function. By default, profiles for all variables are being calculated and all numeric features are being plotted. One can limit the number of variables that should be considered with the variables argument.

To obtain CP profiles, the ceteris_paribus() function requires the explainer-object and the instance data frame as arguments. As a result, the function yields an object od the class ceteris_paribus_explainer. It is a data frame with model predictions.

## Top profiles    : 
##                class gender age sibsp parch fare  embarked _yhat_ _vname_
## 1                3rd   male  47     0     0   25 Cherbourg  0.100   class
## 1.1              2nd   male  47     0     0   25 Cherbourg  0.054   class
## 1.2              1st   male  47     0     0   25 Cherbourg  0.246   class
## 1.3 engineering crew   male  47     0     0   25 Cherbourg  0.096   class
## 1.4 victualling crew   male  47     0     0   25 Cherbourg  0.098   class
## 1.5 restaurant staff   male  47     0     0   25 Cherbourg  0.092   class
##     _ids_          _label_
## 1       1 Random Forest v6
## 1.1     1 Random Forest v6
## 1.2     1 Random Forest v6
## 1.3     1 Random Forest v6
## 1.4     1 Random Forest v6
## 1.5     1 Random Forest v6
## 
## 
## Top observations:
##   class gender age sibsp parch fare  embarked _yhat_          _label_
## 1   1st   male  47     0     0   25 Cherbourg  0.246 Random Forest v6
##   _ids_
## 1     1

To obtain a graphical representation of CP profiles, the generic plot() function can be applied to the data frame returned by the ceteris_paribus() function. It returns a ggplot2 object that can be processed further if needed. In the examples below, we use the ggplot2 functions, like ggtitle() or ylim(), to modify plot’s title or the range of the Y-axis.

The resulting plot can be enriched with additional data by applying functions ingredients::show_rugs() (adds rugs for the selected points), ingredients::show_observations (adds dots that shows observations), or ingredients::show_aggreagated_profiles. All these functions can take additional arguments to modify size, color, or linetype.

Below we show an R snippet that can be used to replicate plots presented in the upper part of Figure 6.4.

Ceteris-paribus profiles for `age` and `fare` variables and the `titanic_rf_v6` model.

Figure 6.5: Ceteris-paribus profiles for age and fare variables and the titanic_rf_v6 model.

By default, all numerical variables are plotted. To plot CP profiles for categorical variables, we have got to add the only_numerical = FALSE argument to the plot() function. The code below an be used to recreate the right-hand-side plot from Figure 6.3.

Ceteris-paribus profiles for `class` and `embarked` variables and the `titanic_rf_v6` model.

Figure 6.6: Ceteris-paribus profiles for class and embarked variables and the titanic_rf_v6 model.

6.6.2 Advanced use of the ceteris_paribus function

The ceteris_paribus() is a very flexible function. To better understand how it can be used, we briefly review its arguments.

  • x, data, predict_function, label - information about a model. If x is created with the DALEX::explain function, then other arguments are extracted from x; this is how we use the function in this chapter. Otherwise, we have got to specify directly the model, the validation data, the predict function, and the model label.
  • new_observation - instance (one or more), for which we want to calculate CP profiles. It should be a data frame with same variables as in the validation data.
  • y - observed value of the dependent variable for new_observation. The use of this argument is illustrated in Section 8.1.
  • variables - names of explanatory variables, for which CP profiles are to be calculated. By default, the profiles will be constructed for all variables, which may be time consuming.
  • variable_splits - a list of values for which CP profiles are to be calculated. By default, these are all values for categorical variables. For continuous variables, uniformly-placed values are selected; one can specify the number of the values with the grid_points argument (the default is 101).

The code below allows to obtain the plots in the upper part of Figure 6.4. The argument variable_splits specifies the variables (age and fare) for which CP profiles are to be calculated, together with the list of values at which the profiles are to be evaluated.

Ceteris-paribus profiles for `class` and `embarked` variables and the `titanic_rf_v6` model. Blue dot stands for `henry`.

Figure 6.7: Ceteris-paribus profiles for class and embarked variables and the titanic_rf_v6 model. Blue dot stands for henry.

To enhance the plot, additional functions can be used. The generic plot() function creates a ggplot2 object with a single geom_line layer. Function show_observations adds geom_point layer, show_rugs adds geom_rugs, while show_profiles adds another geom_line. All these functions take, as the first argument, an object created with the ceteris_paribus function. They can be combined freely to superpose profiles for different models or observations.

In the example below, we present the code to create XO profiles for two passengers, henry and johny_d. Their profiles are included in a plot presented in Figure 6.8. We use the scale_color_manual function to add names of passengers to the plot, and to control colors and positions.

Ceteris-paribus profiles for the `titanic_rf_v6` model. Profiles for different passangers are color-coded.

Figure 6.8: Ceteris-paribus profiles for the titanic_rf_v6 model. Profiles for different passangers are color-coded.

6.6.3 Champion-challenger analysis

One of the most interesting uses of the explainers is comparison of CP profiles for two or more of models.

To illustrate this possibility, first, we have go to construct profiles for the models. In our illustration, for the sake of clarity, we limit ourselves just to two models: the logistic regression and random forest models for the Titanic data. Moreover, we only consider the age and fare variables. We use henry as the instance, for which predictions are of interest.

Subsequently, we construct the plot. The result is shown in Figure 6.9. Predictions for henry are slightly different, logistic regression returns in this case higher predictions then random forest. For age variable profiles of both models are similar, in both models we see decreasing dependency. While for fare the logistic regression model is slightly positive while random forest is negative. The larger the fare the larger is difference between these models. Such analysis helps us to which degree different models agree on what if scenarios.

Note that every plot and show_* function can take a collection of explainers as arguments. Profiles for different models are included in a single plot. In the presented R snippet, models are color-coded with the help of the argument color = "_label_", where _label_ refers to the name of the column in the CP explainer that contains the model label.

Champion-challenger comparison of the `titanic_lmr_v6` and `titanic_rf_v6` models. Profiles for different models are color-coded.

Figure 6.9: Champion-challenger comparison of the titanic_lmr_v6 and titanic_rf_v6 models. Profiles for different models are color-coded.

References

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015a. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

Tufte, Edward R. 1986. The Visual Display of Quantitative Information. Cheshire, CT, USA: Graphics Press.

Biecek, Przemyslaw. 2019a. Ingredients: Effects and Importances of Model Ingredients. https://ModelOriented.github.io/ingredients/.

O’Connell, Mark, Catherine Hurley, and Katarina Domijan. 2017. “Conditional Visualization for Statistical Models: An Introduction to the Condvis Package in R.” Journal of Statistical Software, Articles 81 (5): 1–20. https://doi.org/10.18637/jss.v081.i05.

Greenwell, Brandon M. 2017a. “Pdp: An R Package for Constructing Partial Dependence Plots.” The R Journal 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015b. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

Apley, Dan. 2018a. ALEPlot: Accumulated Local Effects (Ale) Plots and Partial Dependence (Pd) Plots. https://CRAN.R-project.org/package=ALEPlot.

Molnar, Christoph, Bernd Bischl, and Giuseppe Casalicchio. 2018a. “Iml: An R Package for Interpretable Machine Learning.” JOSS 3 (26). Journal of Open Source Software: 786. https://doi.org/10.21105/joss.00786.