4.1 Outlier detection

Function model_performance() may be used to identify outliers. This function was already introduced in section 3.1 but we will present here its other uses.

As you may remember, residuals for random forest were smaller in general, except for a small fraction of very high residuals.

Let’s use the model_performance() function to extract and plot residuals against the observed true values.

Figure 4.1: Diagnostic plot for the random forest model. Clearly the more expensive are apartments the more underestimated are model predictions

Diagnostic plot for the random forest model. Clearly the more expensive are apartments the more underestimated are model predictions

Lets see which variables stand behind the model prediction for an apartment with largest residual.

Table 4.1: Observation with the largest residual in the random forest model

m2.price construction.year surface floor no.rooms district
1161 6679 2005 22 1 2 Srodmiescie