11 Ceteris-paribus Profiles

11.1 Introduction

Chapters 710 are related to the decomposition of a prediction \(f(x)\) into parts linked with particular variables. In this chapter we focus on methods that analyse an effect of selected variables on model response. These techniques may be used for sensitivity analysis, how stable is the model response, or to what-if analysis, how model response would change if input is changes. It is important to remember that the what-if analysis is performed not in the sense of causal modeling, but in the sense of model exploration. We need causal model to do causal inference for the real-world phenomena. Here we focus on explanatory analysis of the model behaviour. To show the difference between these two things, think about a model for survival for lung-cancer patients based on some treatment parameters. We need causal model to say how the survival would change if the treatment is changed. Techniques presented in this chapter will explore how the model result will change if the treatment is changed.

Ceteris paribus is a Latin phrase meaning “other things held constant” or “all else unchanged.” In this chapter, we introduce a technique for model exploration based on the Ceteris paribus principle. In particular, we examine the influence of each explanatory variable, assuming that effects of all other variables are unchanged. The main goal is to understand how changes in a single explanatory variable affects model predictions.

Explanation tools (explainers) presented in this chapter are linked to the second law introduced in Section 1.3, i.e. the law of “Prediction’s speculation.” This is why the tools are also known as What-If model analysis or Individual Conditional Expectations (Goldstein et al. 2015). It appears that it is easier to understand how a black-box model is working if we can explore the model by investigating the influence of explanatory variables separately, changing one at a time.

11.2 Intuition

Ceteris-paribus profiles show how the model response would change if a single variable is changed. For example, panel A of Figure 11.1 presents response (prediction) surface for the titanic_lmr_v6 model for two explanatory variables, age and class, from the titanic dataset (see Section 5.1). We are interested in the change of the model prediction induced by each of the variables. Toward this end, we may want to explore the curvature of the response surface around a single point with age equal to 47 and class equal to “1st,” indicated in the plot. Ceteris-paribus (CP) profiles are one-dimensional profiles that examine the curvature across each dimension, i.e., for each variable. Panel B of Figure 11.1 presents the CP profiles corresponding to age and class. Note that, in the CP profile for age, the point of interest is indicated by the black dot. In essence, a CP profile shows a conditional expectation of the dependent variable (response) for the particular explanatory variable.

Panel A) Model response (prediction) surface. Ceteris-paribus (CP) profiles marked with black curves help to understand the curvature of the surface while changing only a single explanatory variable. Panel B) CP profiles for individual variables, age (continuous) and class (categorical).

Figure 11.1: Panel A) Model response (prediction) surface. Ceteris-paribus (CP) profiles marked with black curves help to understand the curvature of the surface while changing only a single explanatory variable. Panel B) CP profiles for individual variables, age (continuous) and class (categorical).

Animated model response for 2D surface as in 11.1.

Figure 11.2: Animated model response for 2D surface as in 11.1.

CP belongs to the class of techniques that examine local curvature of the model response surface. Other very popular technique from this class called LIME is presented in Chapter 10. The difference between these two methods lies in the fact that LIME approximates the model of interest locally with a simpler glass-box model. Usually, the LIME model is sparse, i.e., contains fewer explanatory variables. Thus, one needs to investigate a plot across a smaller number of dimensions. On the other hand, the CP profiles present conditional predictions for a single variable and, in most cases, are easier to interpret. More detailed comparison of these techniques is presented in the Chapter 14.

11.3 Method

In this section, we introduce more formally one-dimensional CP profiles.

Recall (see Section 2.3) that we use \(x_i\) to refer to the vector corresponding to the \(i\)-th observation in a dataset. Let \(x^{j}_{*}\) denote the \(j\)-th element of \(x_{*}\), i.e., the \(j\)-th explanatory variable. We use \(x^{-j}_{*}\) to refer to a vector resulting from removing the \(j\)-th element from \(x_{*}\). By \(x^{j|=z}_{*}\), we denote a vector resulting from changing the value of the \(j\)-th element of \(x_{*}\) to (a scalar) \(z\).

We define a one-dimensional CP profile \(h()\) for model \(f()\), the \(j\)-th explanatory variable, and point \(x_*\) as follows:

\[\begin{equation} h^{f,j}_{x_*}(z) := f(x_*^{j|=z}). \tag{11.1} \end{equation}\]

CP profile is a function that provides the dependence of the approximated expected value (prediction) of \(Y\) on the value \(z\) of the \(j\)-th explanatory variable. Note that, in practice, \(z\) is taken to go through the entire range of values typical for the variable, while values of all other explanatory variables are kept fixed at the values specified by \(x_*\).

Note that in the situation when only a single model is considered, we will skip the model index and we will denote the CP profile for the \(j\)-th explanatory variable and the point of interest \(x_*\) by \(h^{j}_{x_*}(z)\).

11.4 Example: Titanic

For continuous explanatory variables, a natural way to represent the CP function is to use a profile plot similar to the ones presented in Figure 11.3. In the figure, the dot on the curves marks an instance prediction, i.e., prediction \(f(x_*)\) for a single observation \(x_*\). The curve itself shows how the prediction would change if the value of a particular explanatory variable changed.

Figure 11.3 presents CP profiles for the age variable in the logistic regression titanic_lmr_v6 and random forest model titanic_rf_v6 for the Titanic dataset (see Sections 5.1.2 and 5.1.3, respectively). It is worth observing that the profile for the logistic regression model is smooth, while the one for the random forest model shows more variability. For this instance (observation), the prediction for the logistic regression model would increase substantially if the value of age became lower than 20. For the random forest model, a substantial increase would be obtained if age became lower than 13 or so.

Ceteris-paribus profiles for variable age for the logistic regression (titanic_lmr_v6) and random forest (titanic_rf_v6 ) models that predict the probability of surviving based on the Titanic data. Black dot corresponds to the passenger johny_d.

Figure 11.3: Ceteris-paribus profiles for variable age for the logistic regression (titanic_lmr_v6) and random forest (titanic_rf_v6 ) models that predict the probability of surviving based on the Titanic data. Black dot corresponds to the passenger johny_d.

For a categorical explanatory variable, a natural way to represent the CP function is to use a barplot similar to the ones presented in Figure 11.4. The barplots in Figure 11.3 present CP profiles for the class variable in the logistic regression and random forest models for the Titanic dataset (see Sections 5.1.2 and 5.1.3, respectively). For this instance (observation), the predicted probability for the logistic regression model would decrease substantially if the value of class changed to “2nd”. On the other hand, for the random forest model, the largest change would be marked if class changed to “restaurant staff”.

Ceteris-paribus profiles for variable class for the logistic regression (titanic_lmr_v6) and random forest (titanic_rf_v6 ) models that predict the probability of surviving based on the Titanic data.

Figure 11.4: Ceteris-paribus profiles for variable class for the logistic regression (titanic_lmr_v6) and random forest (titanic_rf_v6 ) models that predict the probability of surviving based on the Titanic data.

Usually, black-box models contain a large number of explanatory variables. However, CP profiles are legible even for tiny subplots, created with techniques like sparklines or small multiples (Tufte 1986). In this way we can display a large number of profiles at the same time keeping profiles for consecutive variables in separate panels, as shown in Figure 11.5 for the random forest model for the Titanic dataset. It helps if these panels are ordered so that the most important profiles are listed first. We discuss a method to assess the importance of CP profiles in the next chapter.

Ceteris-paribus profiles for all continuous explanatory variables for the random forest (titanic_rf_v6) model for the titanic dataset.

Figure 11.5: Ceteris-paribus profiles for all continuous explanatory variables for the random forest (titanic_rf_v6) model for the titanic dataset.

11.5 Pros and cons

One-dimensional CP profiles, as presented in this chapter, offer a uniform, easy to communicate and extendable approach to model exploration. Their graphical representation is easy to understand and explain. It is possible to show profiles for many variables or models in a single plot. CP profiles are easy to compare, thus we can juxtapose two or more models to better understand differences between models. We can also compare two or more instances to better understand model stability. CP profiles are also a useful tool for sensitivity analysis.

But. There are several issues related to the use of the CP profiles. If explanatory variables are correlated, then changing one variable implies a change in the other. In such case, the application of the Ceteris paribus principle may lead to unrealistic settings, as it is not possible to keep one variable fixed while varying the other one. For example, apartment’s price prediction features like surface and number of rooms are correlated thus it is unrealistic to consider very small apartments with extremely large number of rooms. Special cases are interactions, which require the use of two-dimensional CP profiles that are more complex than one-dimensional ones. Also, in case of a model with hundreds or thousands of variables, the number of plots to inspect may be daunting. Finally, while barplots allow visualization of CP profiles for factors (categorical explanatory variables), their use becomes less trivial in case of factors with many nominal (unordered) categories (like, for example, a ZIP-code).

11.6 Code snippets for R

In this section, we present key features of the R package DALEX which is a part of DrWhy.AI universe and covers all methods presented in this chapter. Note that presented functions in fact are wrappers to package ingredients (Biecek et al. 2019).

Note that there are also other R packages that offer similar functionality, like condvis (O’Connell, Hurley, and Domijan 2017), pdp (Greenwell 2017), ICEbox (Goldstein et al. 2015), ALEPlot (Apley 2018), iml (Molnar, Bischl, and Casalicchio 2018).

For illustration, we use two classification models developed in Chapter 5.1, namely the logistic regression model titanic_lmr_v6 (Section 5.1.2) and the random forest model titanic_rf_v6 (Section 5.1.3). They are developed to predict the probability of survival after sinking of Titanic. Instance-level explanations are calculated for a single observation henry - a 47 years old male passenger that travelled in the 1st class.

DALEX explainers for both models and the henry data frame are retrieved via the archivist hooks as listed in Section 5.1.8.

##   class gender age sibsp parch fare  embarked
## 1   1st   male  47     0     0   25 Cherbourg

11.6.1 Basic use of the individual_profile function

The easiest way to create and plot CP profiles is to call individual_profile() function and then the generic plot() function. By default, profiles for all variables are being calculated and all numeric features are being plotted. One can limit the number of variables that should be considered with the variables argument.

To obtain CP profiles, the individual_profile() function requires the explainer-object and the instance data frame as arguments. As a result, the function yields an object of the class ceteris_paribus_explainer. It is a data frame with model predictions.

## Top profiles    : 
##                class gender age sibsp parch fare  embarked _yhat_ _vname_
## 1                1st   male  47     0     0   25 Cherbourg  0.246   class
## 1.1              2nd   male  47     0     0   25 Cherbourg  0.054   class
## 1.2              3rd   male  47     0     0   25 Cherbourg  0.100   class
## 1.3        deck crew   male  47     0     0   25 Cherbourg  0.454   class
## 1.4 engineering crew   male  47     0     0   25 Cherbourg  0.096   class
## 1.5 restaurant staff   male  47     0     0   25 Cherbourg  0.092   class
##     _ids_       _label_
## 1       1 Random Forest
## 1.1     1 Random Forest
## 1.2     1 Random Forest
## 1.3     1 Random Forest
## 1.4     1 Random Forest
## 1.5     1 Random Forest
## 
## 
## Top observations:
##   class gender age sibsp parch fare  embarked _yhat_       _label_ _ids_
## 1   1st   male  47     0     0   25 Cherbourg  0.246 Random Forest     1

To obtain a graphical representation of CP profiles, the generic plot() function can be applied to the data frame returned by the individual_profile() function. It returns a ggplot2 object that can be processed further if needed. In the examples below, we use the ggplot2 functions, like ggtitle() or ylim(), to modify plot’s title or the range of the Y-axis.

The resulting plot can be enriched with additional data by applying functions ingredients::show_rugs (adds rugs for the selected points), ingredients::show_observations (adds dots that shows observations), or ingredients::show_aggreagated_profiles. All these functions can take additional arguments to modify size, color, or linetype.

Below we show an R snippet that can be used to replicate plots presented in the upper part of Figure 11.5.

Ceteris-paribus profiles for age and fare variables and the titanic_rf_v6 model.

Figure 11.6: Ceteris-paribus profiles for age and fare variables and the titanic_rf_v6 model.

By default, all numerical variables are plotted. To plot CP profiles for categorical variables, we have got to add the variable_type = "categorical" argument to the plot() function. The code below an be used to recreate the right-hand-side plot from Figure 11.4.

Ceteris-paribus profiles for class and embarked variables and the titanic_rf_v6 model.

Figure 11.7: Ceteris-paribus profiles for class and embarked variables and the titanic_rf_v6 model.

11.6.2 Advanced use of the individual_profile function

The individual_profile() is a very flexible function. To better understand how it can be used, we briefly review its arguments.

  • x, data, predict_function, label - information about a model. If x is created with the DALEX::explain function, then other arguments are extracted from x; this is how we use the function in this chapter. Otherwise, we have got to specify directly the model, the validation data, the predict function, and the model label.
  • new_observation - instance (one or more), for which we want to calculate CP profiles. It should be a data frame with same variables as in the validation data.
  • y - observed value of the dependent variable for new_observation. The use of this argument is illustrated in Section 13.1.
  • variables - names of explanatory variables, for which CP profiles are to be calculated. By default, the profiles will be constructed for all variables, which may be time consuming.
  • variable_splits - a list of values for which CP profiles are to be calculated. By default, these are all values for categorical variables. For continuous variables, uniformly-placed values are selected; one can specify the number of the values with the grid_points argument (the default is 101).

The code below allows to obtain the plots in the upper part of Figure 11.5. The argument variable_splits specifies the variables (age and fare) for which CP profiles are to be calculated, together with the list of values at which the profiles are to be evaluated.

Ceteris-paribus profiles for class and embarked variables and the titanic_rf_v6 model. Blue dot stands for henry.

Figure 11.8: Ceteris-paribus profiles for class and embarked variables and the titanic_rf_v6 model. Blue dot stands for henry.

To enhance the plot, additional functions can be used. The generic plot() function creates a ggplot2 object with a single geom_line layer. Function show_observations adds geom_point layer, show_rugs adds geom_rugs, while show_profiles adds another geom_line. All these functions take, as the first argument, an object created with the ceteris_paribus function. They can be combined freely to superimpose profiles for different models or observations.

In the example below, we present the code to create CP profiles for two passengers, henry and johny_d. Their profiles are included in a plot presented in Figure 11.9. We use the scale_color_manual function to add names of passengers to the plot, and to control colors and positions.

Ceteris-paribus profiles for the titanic_rf_v6 model. Profiles for different passengers are color-coded.

Figure 11.9: Ceteris-paribus profiles for the titanic_rf_v6 model. Profiles for different passengers are color-coded.

11.6.3 Champion-challenger analysis

One of the most interesting uses of the explainers is comparison of CP profiles for two or more of models.

To illustrate this possibility, first, we have go to construct profiles for the models. In our illustration, for the sake of clarity, we limit ourselves just to two models: the logistic regression and random forest models for the Titanic data. Moreover, we only consider the age and fare variables. We use henry as the instance, for which predictions are of interest.

Subsequently, we construct the plot. The result is shown in Figure 11.10. Predictions for henry are slightly different, logistic regression returns in this case higher predictions then random forest. For age variable profiles of both models are similar, in both models we see decreasing dependency. While for fare the logistic regression model is slightly positive while random forest is negative. The larger the fare the larger is difference between these models. Such analysis helps us to which degree different models agree on what if scenarios.

Note that every plot and show_* function can take a collection of explainers as arguments. Profiles for different models are included in a single plot. In the presented R snippet, models are color-coded with the help of the argument color = "_label_", where _label_ refers to the name of the column in the CP explainer that contains the model label.

Champion-challenger comparison of the titanic_lmr_v6 and titanic_rf_v6 models. Profiles for different models are color-coded.

Figure 11.10: Champion-challenger comparison of the titanic_lmr_v6 and titanic_rf_v6 models. Profiles for different models are color-coded.

References

Apley, Dan. 2018. ALEPlot: Accumulated Local Effects (Ale) Plots and Partial Dependence (Pd) Plots. https://CRAN.R-project.org/package=ALEPlot.

Biecek, Przemyslaw, Hubert Baniecki, Adam Izdebski, and Katarzyna Pekala. 2019. ingredients: Effects and Importances of Model Ingredients.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

Greenwell, Brandon M. 2017. “pdp: An R Package for Constructing Partial Dependence Plots.” The R Journal 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

Molnar, Christoph, Bernd Bischl, and Giuseppe Casalicchio. 2018. “iml: An R package for Interpretable Machine Learning.” Joss 3 (26). Journal of Open Source Software: 786. https://doi.org/10.21105/joss.00786.

O’Connell, Mark, Catherine Hurley, and Katarina Domijan. 2017. “Conditional Visualization for Statistical Models: An Introduction to the Condvis Package in R.” Journal of Statistical Software, Articles 81 (5): 1–20. https://doi.org/10.18637/jss.v081.i05.

Tufte, Edward R. 1986. The Visual Display of Quantitative Information. Cheshire, CT, USA: Graphics Press.