# Chapter 4 Ceteris-paribus Profiles - a Tool for What-If Analysis

## 4.1 Introduction

Ceteris paribus is a Latin phrase meaning “other things held constant” or “all else unchanged.” In this chapter, we introduce a technique for model exploration based on the Ceteris paribus principle. In particular, we examine the influence of each explanatory variable, asumming that effects of all other variables are unchanged. The main goal is to understand how changes in a single explanatory variable affects model predictions.

Explanation tools (explainers) presented in this chapter are linked to the second law introduced in Section 1.2, i.e. the law of “Prediction’s speculation.” This is why the tools are also known as What-If model analysis or Individual Conditional Expectations (Goldstein et al. 2015a). It turns out that it is easier to understand how a black-box model is working if we can explore the model by investigating the influence of explanatory variables separately, changing one at a time.

## 4.2 Intuition

Panel A of Figure 4.1 presents response (prediction) surface for the titanic_lmr_v6 model for two explanatory variables, age and class, from the titanic dataset (see Section 2.1). We are interested in the change of the model prediction induced by each of the variables. Toward this end, we may want to explore the curvature of the response surface around a single point with age equal to 47 and class equal to “1st,” indicated in the plot. Ceteris-paribus (CP) profiles are one-dimensional profiles that examine the curvature across each dimension, i.e., for each variable. Panel B of Figure 4.1 presents the profiles corresponding to age and class. Note that, in the CP profile for age, the point of interest is indicated by the black dot. In essence, a CP profile shows a conditional expectation of the dependent variable (response) for the particular explanatory variable.

CP technique is similar to the LIME method (see Chapter 12). LIME and CP profiles examine the curvature of a model response-surface. The difference between these two methods lies in the fact that LIME approximates the black-box model of interest locally with a simpler white-box model. Usually, the LIME model is sparse, i.e., contains fewer variables, and thus we have got to graphically investigate a smaller number of dimensions. On the other hand, the CP profiles present conditional predictions for every variable and, in most cases, are easier to intepret.

## 4.3 Method

In this section we introduce more formally one-dimensional CP profiles.

In predictive modeling, we are interested a distribution of a dependent variable $$Y$$ given vector $$x^*$$ that contains the values of explanatory variables. In the ideal world we would like to know the conditional distribution of $$Y$$ given $$x^*$$, $$Y | x^*$$. In practical applications we usually do not predict the entire distribution, but just some of its characteristics like the expected (mean) value, a quantile, or variance. Without loss of generality we will assume that we model the expected value $$E_Y(Y | x^*)$$.

Assume that we have got model $$f()$$, for which $$f(x^*)$$ is an approximation of $$E_Y(Y | x^*)$$, i.e., $$E_Y(Y | x^*) \approx f(x^*)$$. Note that we do not assume that it is a “good” model, nor that the approximation is precise. We simply assume that we have got a model that is used to estimate the expected value and that to form predictions of the dependent variable. Our interest lies in the evalution of the quality of the predictions. If the model offers a “good” approximation of the expected value, it should be reflected in its satisfactory predictive performance.

We will use subscript $$x^*_i$$ to refer to the vector corresponding to the $$i$$-th observation in a dataset. We will use superscript $$x^{*j}$$ to refer to the $$j$$-th element of $$x^*$$, i.e., the $$j$$-th variable. Additionally, let $$x^{*-j}$$ denote a vector resultinig from removing the $$j$$-th element from vector $$x^{*}$$. Moreover, let $$x^{*|j}=z$$ denote a vector in which the $$j$$-th element is equal to $$z$$ (a scalar).

We define a one-dimensional CP profile for the model $$f()$$, $$j$$-th explanatory variable, and point $$x^*$$ as follows:

$CP^{f, j, x^*}(z) \equiv f(x^{*|j} = z).$ That is, CP profile is a function that provides the dependence of the approximated expected value (prediction) of the model for $$Y$$ on the value of $$j$$-th explanatory variable $$z$$. Note that $$z$$ is taken to go through the range of values typical for the variable, while values of all other explanatory variables are kept fixed at the values given by $$x^*$$.

## 4.4 Example: Titanic data

For continuous explanatory variables, a natural way to represent the CP function is to use a profile plot similar to the ones presented in Figure 4.2. In the figure, the dot on the curves marks an instance prediction, i.e., prediction $$f(x^*)$$ for a single observation $$x^*$$. The curve itself shows how the prediction would change if the value of a particular explanatory variable changed.

Figure 4.2 presents CP profiles for the age variable in the logistic regression and random forest models for the Titanic dataset (see Section 2.5). It is worth oberving that the profile for the logistic regression model is smooth, while for the random forest model it shows more variability. For this instance (observation), the prediction for both models would increase substantially if the value of the explanatory variable became lower than 20.

For a categorical explanatory variable, a natural way to represent the CP function is to use a barplot similar to the ones presented in Figure 4.3. The barplots in Figure 4.2 present CP profiles for the class variable in the logistic regression and random forest models for the Titanic dataset (see Section 2.5). For this instance (observation), the predicted probability for the logistic regression model would decrease substantially if the value of class changed to “2nd”. On the other hand, for the random forest model, the largest change would be marked if class changed to “restaurant staff”.

Usually, black-box models contain a large number of explanatory variables. However, CP profiles are legible even for tiny subplots, created with techniques like sparklines or small multiples (Tufte 1986). In this way we can display a large number of profiles at the same time keeping profiles for consecutive variables in separate panels, as shown in Figure 4.4 for the random forest model for the Titanic dataset. It helps if these panels are ordered so that the most important profiles are listed first. We discuss a method to assess the importance of CP profiles in the next chapter.

## 4.5 Pros and cons

One-dimensional CP profiles, as presented in this chapter, offer a uniform, easy to comunicate and extendable approach to model exploration. Their graphical representation is easy to understand and explain. It is possible to show profiles for many variables or models in a single plot.

There are several issues related to the use of the CP profiles. If explanatory variables are correlated, then changing one variable implies a change in the other. In such case, the application of the Ceteris paribus principle may lead to unrealistic settings, as it is not possible to keep one variable fixed while varying the other one. A special case are interactions, which require the use of two-dimensional CP profiles that are more complex than one-dimensional ones. Also, in case of a model with hundreds or thousands of variables, the number of plots to inspect may be daunting. Finally, while barplots allow visualization of CP profiles for factors (categorical explanatory variables), their use becomes less trivial in case of factors with many nominal (unordered) categories (like, for example, a ZIP-code).

## 4.6 Code snippets for R

In this section we present key features of the R package ingredients (Biecek 2019) which is a part of DALEXverse and covers all methods presented in this chapter. More details and examples can be found at https://modeloriented.github.io/ingredients/.

Note that there are also other R packages that offer similar functionality, like condvis (O’Connell, Hurley, and Domijan 2017), pdp (Greenwell 2017a), ICEbox (Goldstein et al. 2015b), ALEPlot (Apley 2018a), iml (Molnar, Bischl, and Casalicchio 2018a).

In this section, we use the random forest (Breiman et al. 2018) model titanic_rf_v6 developed for the Titanic dataset (see Section 2.1). In particular, we deal with a binary classification problem - we want to predict the probability of survival for a selected passenger.

library("DALEX")
library("randomForest")

titanic_rf_v6 <- archivist::aread("pbiecek/models/31570")

CP profiles are calculated in four steps with the ingredients package.

1. Create an explainer - wrapper around model and validation data.

Model-objects created with different libraries may have different internal structures. Thus, first, we have got to create a wrapper around the model. Toward this end, we use the explain() function from the DALEX package (Biecek 2018). The function requires five arguments:

• model, a model-object
• data, a validation data frame
• y, observed values of the dependent variable for the validation data
• predict_function, a function that returns prediction scores; if not specified, then a default predict() function is used
• label, a function that returns prediction scores; if not specified, then it is extracted from the class(model). In the example below we use the training data as the validation dataset.
explain_titanic_rf <- explain(model = titanic_rf_v6,
data = titanic[, -9],
y = titanic\$survived == "yes",
label = "Random Forest v6")

2. Define the instance (observation) of interest.

CP profiles explore model around a single observation. In the exampe below, we use data frame henry created in Section @ref(predictions_titanic). It contains data for an 8-year-old boy who embarked in Belfast and travelled in the 2nd class with no parents nor siblings with a ticket costing 72 pounds. Then, we obtain the model prediction for this instance with the help of the predict()’ function. In particular, we compute the probability for each category of the dependent binary variable.

[TOMASZ: IN THE DATA CHAPTER WE HAD DATA FRAME HENRY. SHOULD NOT WE RE-USE IT HERE?]

henry <- data.frame(
class = factor("2nd", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew", "restaurant staff", "victualling crew")),
gender = factor("male", levels = c("female", "male")),
age = 8,
sibsp = 0,
parch = 0,
fare = 72,
embarked = factor("Belfast", levels = c("Belfast","Cherbourg","Queenstown","Southampton"))
)

predict(explain_titanic_rf, henry)
## [1] 0.34

3. Calculate CP profiles

To obtain CP profiles, we use the ceteris_paribus() function. It requires the explainer-object and the instance data frame as arguments. By default, CP profiles are calculated for all numerical variables. To select a subset of variables, the variables argument can be used.

As a result, the function yields an object od the class ceteris_paribus_explainer. It is a data frame with model predictions.

library("ingredients")
cp_titanic_rf <- ceteris_paribus(explain_titanic_rf, henry,
variables = c("age", "fare", "class", "gender"))
cp_titanic_rf
## Top profiles    :
##     class gender        age sibsp parch fare embarked _yhat_ _vname_ _ids_
## 1     2nd   male  0.1666667     0     0   72  Belfast  0.396     age     1
## 1.1   2nd   male  2.0000000     0     0   72  Belfast  0.420     age     1
## 1.2   2nd   male  4.0000000     0     0   72  Belfast  0.404     age     1
## 1.3   2nd   male  7.0000000     0     0   72  Belfast  0.350     age     1
## 1.4   2nd   male  9.0000000     0     0   72  Belfast  0.324     age     1
## 1.5   2nd   male 13.0000000     0     0   72  Belfast  0.182     age     1
##              _label_
## 1   Random Forest v6
## 1.1 Random Forest v6
## 1.2 Random Forest v6
## 1.3 Random Forest v6
## 1.4 Random Forest v6
## 1.5 Random Forest v6
##
##
## Top observations:
##   class gender age sibsp parch fare embarked _yhat_          _label_ _ids_
## 1   2nd   male   8     0     0   72  Belfast   0.34 Random Forest v6     1

4. Plot CP profiles.

To obtain a graphical represenation of CP profiles, the generic plot() function can be applied to the data frame returend by the ceteris_paribus() function. It returns a ggplot2 object that can be processed if needed.

The resulting plot can be enriched with additional data by applying functions show_rugs (adds rugs for the selected points), show_observations (adds observations), or show_aggreagated_profiles` (see Chapter 16). All these functions can take additional arguments to modify size, color, or linetype.

[TOMASZ: “HENRY” IS USED AS AN EXAMPLE IN THE “SNIPPETS” SECTION. HOWEVER, IN THE DATASET CHAPTER WE POSTED SOME QUESTIONS REGARDING PREDICTIONS FOR “HENRY”. THUS, WE SHOULD COMMENT ON THE USE OF THE TOOLS IN THAT CASE. TO ME, IT WOULD THEN BE LOGICAL TO DSICUSS “HENRY” IN THE “EXAMPLE” SECTION, RATHER THEN IN THE “SNIPPETS”.]

### References

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015a. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095.

Tufte, Edward R. 1986. The Visual Display of Quantitative Information. Cheshire, CT, USA: Graphics Press.

Biecek, Przemyslaw. 2019. Ingredients: Effects and Importances of Model Ingredients. https://ModelOriented.github.io/ingredients/.

O’Connell, Mark, Catherine Hurley, and Katarina Domijan. 2017. “Conditional Visualization for Statistical Models: An Introduction to the Condvis Package in R.” Journal of Statistical Software, Articles 81 (5): 1–20. doi:10.18637/jss.v081.i05.

Greenwell, Brandon M. 2017a. “Pdp: An R Package for Constructing Partial Dependence Plots.” The R Journal 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015b. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095.

Apley, Dan. 2018a. ALEPlot: Accumulated Local Effects (Ale) Plots and Partial Dependence (Pd) Plots. https://CRAN.R-project.org/package=ALEPlot.

Molnar, Christoph, Bernd Bischl, and Giuseppe Casalicchio. 2018a. “Iml: An R Package for Interpretable Machine Learning.” JOSS 3 (26). Journal of Open Source Software: 786. doi:10.21105/joss.00786.

Breiman, Leo, Adele Cutler, Andy Liaw, and Matthew Wiener. 2018. RandomForest: Breiman and Cutler’s Random Forests for Classification and Regression. https://CRAN.R-project.org/package=randomForest.

Biecek, Przemyslaw. 2018. DALEX: Descriptive mAchine Learning Explanations. https://pbiecek.github.io/DALEX/.