4 Do-it-yourself with Python

Most of the methods presented in this book are available in both R and Python and can be used in a uniform way. But each of these languages also has many other tools for exploratory model analysis.

In this book, we introduce various methods for instance-level and dataset-level exploration and explanation of predictive models. In each chapter, there is a section with code snippets for R and Python that shows how to use a particular method. In this chapter, we provide a short description of steps that are needed to set-up the Python environment with the required libraries.

4.1 What to install?

The Python interpreter (Van Rossum and Drake 2009) is needed. It is always a good idea to use the newest version. At least Python in version 3.6 is recommended. It can be downloaded from the Python website https://python.org/. A popular environment for simple python installation and configuration is Anaconda, which can be downloaded from website https://www.anaconda.com/.

There are many editors available for Python to edit the code in a convenient way. In the data scientist community a very popular solution is The Jupyter Notebook. It is a web application that allows you to create and share documents that contain live code and, visualizations and descriptions. The Jupyter Notebook can be installed from the website https://jupyter.org/.

Once Python and the editor are available, the required packages should be installed. The most important one is the dalex package currently in version 0.1.9. The package can be installed with pip by executing the following instruction from the command line:

pip install dalex

Installation of dalex will automatically take care about other required libraries.

4.2 How to work with dalex?

There are many libraries in Python that can be used to construct a predictive model. Among the most popular one needs to name algorithm-specific libraries, like catboost (Dorogush, Ershov, and Gulin 2018), xgboost (Chen and Guestrin 2016), keras (Chollet and others 2015) of algorithm agnostic libraries like scikit-learn (Pedregosa et al. 2011).

While it is great to have such a large choice of tools for constructing models, the disadvantage is that different packages have different interfaces and different arguments. Moreover, model-objects created with different packages may have different internal structures. The main goal of the dalex package is to create a level of abstraction around a model that makes it easier to explore and explain the model.

Constructor Explainer() is THE method for model wrapping. There is only one argument that is required by the function; it is model, which is used to specify the model-object with the fitted form of the model. However, the function allows additional arguments that extend its functionalities. They will be discussed in Section 5.3.6.

As soon as the model is packaged by the Explainer object, all further functions will be performed on this object. They will be presented in subsections Code snippets for Python.

4.3 Code snippets for Python

A detailed description of the model exploration will be presented in the next chapters. In general, however, the way of working with the dalex library can be described in three steps: 1. Construction of the wrapper around the model, 2. calculation of the explanation and 3. plotting the explanation.

# 1. create an explainer
exp = dx.Explainer(model, X, y)

# calculate predictions
exp.predict(henry)

# 2. calculate explanation
obs_bd = exp.predict_parts(obs, type='break_down')

# print explanation
obs_bd

# 3. plot explanation
obs_bd.plot()

References

Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–94. KDD ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785.

Chollet, François, and others. 2015. “Keras.” https://github.com/fchollet/keras; GitHub.

Dorogush, Anna Veronika, Vasily Ershov, and Andrey Gulin. 2018. “CatBoost: gradient boosting with categorical features support.” CoRR abs/1810.11363. http://arxiv.org/abs/1810.11363.

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.

Van Rossum, Guido, and Fred L. Drake. 2009. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.