## 4.1 Outlier detection

Function model_performance() may be used to identify outliers. This function was already introduced in section 3.1 but we will present here its other uses.

As you may remember, residuals for random forest were smaller in general, except for a small fraction of very high residuals.

Let’s use the model_performance() function to extract and plot residuals against the observed true values.

mp_rf <- model_performance(explainer_rf)

library("ggplot2")
ggplot(mp_rf, aes(observed, diff)) + geom_point() +
xlab("Observed") + ylab("Predicted - Observed") +
ggtitle("Diagnostic plot for the random forest model") + theme_mi2()

Lets see which variables stand behind the model prediction for an apartment with largest residual.

which.min(mp_rf$diff) ## 1161 new_apartment <- apartmentsTest[which.min(mp_rf$diff), ]
new_apartment

Table 4.1: Observation with the largest residual in the random forest model

m2.price construction.year surface floor no.rooms district
1161 6679 2005 22 1 2 Srodmiescie