Chapter 25 Ceteris-paribus Two-dimensional Profiles - a Tool for Pairwise Interactions

25.1 Introduction

The definition of Ceteris-paribus (CP) profiles, given in Section 7, may be easily extended to two or more explanatory variables. Also, the definition of the variable importance measure \(vip^{CP}_j(x^*)\) have a straightforward extension for a larger number of variables. The extensions are useful to identify or visualize pairwise interactions between explanatory variables.

25.2 Intuition

Figure 25.1 presents response (prediction) surface for the titanic_lmr_v6 model for two explanatory variables, age and sibsp, from the titanic dataset (see Section 5.1). We are interested in the change of the model prediction induced jointly by the variables.

[TOMASZ: THIS IS A BIT WEAK. WHAT INTUITIVE IS ABOUT THE PLOT? WHAT CAN BE SEEN DIFFERENTLY THAN IN AN 1D CP PROFILE? WHICH STRUCTURE WOULD WE LOOK FOR?]

(fig:profile2d) Ceteris-paribus profile for `age` and `sibsp` explanatory variables for the `titanic_lmr_v6` model.

Figure 25.1: (fig:profile2d) Ceteris-paribus profile for age and sibsp explanatory variables for the titanic_lmr_v6 model.

25.3 Method

The definition of one-dimensional CP profiles (see Section 7.3) may be easily extended to two or more explanatory variables. A two-dimensional CP profile for model \(f()\), explanatory variables \(j\) and \(k\), and point \(x^*\) is defined as follows:

\[ CP^{f, (j,k), x^*}(z_1, z_2) \equiv f(x^*|^{(j,k)} = (z_1,z_2)). \]

Thus, a two-dimensional (2D) CP profile is a function that provides the dependence of the instance prediction of the model on the values of \(j\)-th and \(k\)-th explanatory variables \(Z_1\) and \(Z_2\), respectively. The values of \(Z_1\) and \(Z_2\) are taken to go through the range of values typical for the variables. All other explanatory variables are kept fixed at the values given by \(x^*\).

The corresponding variable importance measure is defined as follows: \[ vip^{CP}_{j,k}(x^*) = \int_{\mathcal R}\int_{\mathcal R} |CP^{f,(j,k),x^*}(z_1,z_2) - f(x^*)| g^{j,k}(z_1,z_2)dz_1dz_2=E_{X_j,X_k}[|CP^{f,j,x^*}(X_j,X_k) - f(x^*)|], \] where the expected value is taken over the joint distribution of the \(j\)-th and \(k\)-th explanatory variable.

Such multi-dimensional extensions are useful to check if, for instance, the model involves interactions. In particular, presence of pairwise interactions may be detected with 2D CP profiles.

25.4 Example: Titanic data

A natural way to visualize 2D CP profiles is to use a heat map for all pairs of explanatory variables as, in Figure 25.2.

(fig:profile2dAll) Two-dimensional ceteris-paribus profiles for all pairs of explanatory variables for the `titanic_lmer_v6` model. Black-cross marks the point of interest.

Figure 25.2: (fig:profile2dAll) Two-dimensional ceteris-paribus profiles for all pairs of explanatory variables for the titanic_lmer_v6 model. Black-cross marks the point of interest.

If the number of pairs of explanatory variables is small or moderate, then it is possible to present 2D CP profiles for all pairs of variables.

If the number of pairs is large, we can use the variable importance measure to order the pairs based on their importance and select the most important pairs for purposes of illustration.

[TOMASZ: WE SHOULD INCLUDE HERE A MORE SUBSTANTIVE DISCUSSION REFERRING TO “HENRY”.]

25.5 Pros and cons

Two-dimensional CP profiles can be used to identify the presence and the influence of pairwise interactions in a model. However, for models with a large number of explanatory variables, the number of pairs will be large. Consequently, inspection of all possible 2D CP profiles may be challenging. Moreover, the profiles are more difficult to read and interpret than the 1D CP profiles.

[TOMASZ: 2D CP PROFILES FOR FACTORS?]

25.6 Code snippets for R

In this section, we present key features of the R package ingredients (Biecek 2019a) which is a part of DALEXverse and covers all methods presented in this chapter. More details and examples can be found at https://modeloriented.github.io/ingredients/.

There are also other R packages that offer similar functionality, like condvis (O’Connell, Hurley, and Domijan 2017) or ICEbox (Goldstein et al. 2015).

We use the random forest model titanic_rf_v6 developed for the Titanic dataset (see Section @ref(model_titanic_rf)) as the example. Recall that we deal with a binary classification problem - we want to predict the probability of survival for a selected passenger.

First, we have got to create a wrapper around the model (see Section 7.6).

To calculate oscillations we need to first calculate CP profiles for the selected observation. Let us use henry as the instance prediction of interest.

[TOMASZ: WHY NOT USING THE PRE-DEFINED DATA FRAME?]

2D profiles are calculated by applyiing the ceteris_paribus_2d() function to the wrapper object. By default, all pairs of continuous explanatory variables are used, but one can limit number of variables for consideration through the variables argument. [TOMASZ: FACTORS?]

As a result, we obtain an object of class ceteris_paribus_2d_explainer with overloaded print() and plot() functions. We can use the latter function to obtain plots of the constructed 2D CP profilest.

[TOMASZ: LABELLING OF THE AXES COULD BE IMPROVED. IT IS UNCLEAR WHICH VARIABLES DEFINE THE Y- AND X AXES. ]

The plot suggests that age and sibsp importantly influence the model response. [TOMASZ: WHY? WHICH FEATURE OF THE PLOTS DISTIGUISHES THIS PAIR FROM THE THREE OTHERS?]

[TOMASZ: WE SHOULD DISCUSS “HENRY” IN THE EXAMPLE SECTION. IN THE SNIPPETS, WE SHOULD SIMPLY SHOW THE UNDERLYING CODE.]

25.7 Merging Path Plots and Others

(Demšar and Bosnić 2018)

(???) (Puri et al. 2017)

(Sitko, Grudziąż, and Biecek 2018)

(Strobl et al. 2007) (Strobl et al. 2008) - variable importance

(Fisher, Rudin, and Dominici 2018)

Beware Default Random Forest Importances

Terence Parr, Kerem Turgutlu, Christopher Csiszar, and Jeremy Howard March 26, 2018.

http://explained.ai/rf-importance/index.html

25.8 Other topics

Enslaving the Algorithm: From a ‘Right to an Explanation’ to a ‘Right to Better Decisions’? (Edwards and Veale 2018)

(Paluszynska and Biecek 2017b) (Goldstein, Kapelner, and Bleich 2017) (Apley 2018b)

(Tatarynowicz, Romaszko, and Urbański 2018)

References

Apley, Dan. 2018b. ALEPlot: Accumulated Local Effects (Ale) Plots and Partial Dependence (Pd) Plots. https://CRAN.R-project.org/package=ALEPlot.

Biecek, Przemyslaw. 2019a. Ingredients: Effects and Importances of Model Ingredients. https://ModelOriented.github.io/ingredients/.

Demšar, Jaka, and Zoran Bosnić. 2018. “Detecting Concept Drift in Data Streams Using Model Explanation.” Expert Systems with Applications 92 (February): 546–59. https://doi.org/10.1016/j.eswa.2017.10.003.

Edwards, Lilian, and Michael Veale. 2018. “Enslaving the Algorithm: From a ’Right to an Explanation’ to a ’Right to Better Decisions’?” IEEE Security and Privacy 16 (3): 46–54. https://doi.org/10.1109/MSP.2018.2701152.

Fisher, A., C. Rudin, and F. Dominici. 2018. “Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the ‘Rashomon’ Perspective.” ArXiv E-Prints, January.

Goldstein, Alex, Adam Kapelner, and Justin Bleich. 2017. ICEbox: Individual Conditional Expectation Plot Toolbox. https://CRAN.R-project.org/package=ICEbox.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

O’Connell, Mark, Catherine Hurley, and Katarina Domijan. 2017. “Conditional Visualization for Statistical Models: An Introduction to the Condvis Package in R.” Journal of Statistical Software, Articles 81 (5): 1–20. https://doi.org/10.18637/jss.v081.i05.

Paluszynska, Aleksandra, and Przemyslaw Biecek. 2017b. RandomForestExplainer: Explaining and Visualizing Random Forests in Terms of Variable Importance. https://CRAN.R-project.org/package=randomForestExplainer.

Puri, Nikaash, Piyush Gupta, Pratiksha Agarwal, Sukriti Verma, and Balaji Krishnamurthy. 2017. “MAGIX: Model Agnostic Globally Interpretable Explanations.” CoRR abs/1706.07160. http://arxiv.org/abs/1706.07160.

Sitko, Agnieszka, Aleksandra Grudziąż, and Przemyslaw Biecek. 2018. FactorMerger: The Merging Path Plot. https://CRAN.R-project.org/package=factorMerger.

Strobl, Carolin, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis. 2008. “Conditional Variable Importance for Random Forests.” BMC Bioinformatics 9 (1): 307. https://doi.org/10.1186/1471-2105-9-307.

Strobl, Carolin, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. 2007. “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics 8 (1): 25. https://doi.org/10.1186/1471-2105-8-25.

Tatarynowicz, Magda, Kamil Romaszko, and Mateusz Urbański. 2018. ModelDown: Make Static Html Website for Predictive Models. https://github.com/MI2DataLab/modelDown.