## 3.1 Model performance

As you may remember from the previous chapter, the root mean square of residuals is identical for both considered models. Does it mean that these models are equally good?

predicted_mi2_lm <- predict(apartments_lm_model, apartmentsTest)
sqrt(mean((predicted_mi2_lm - apartmentsTest$m2.price)^2)) ## [1] 283.0865 predicted_mi2_rf <- predict(apartments_rf_model, apartmentsTest) sqrt(mean((predicted_mi2_rf - apartmentsTest$m2.price)^2))
## [1] 286.5357

Function model_performance() calculates predictions and residuals for validation dataset apartmentsTest.

Generic function print() returns quantiles for residuals.

mp_lm <- model_performance(explainer_lm)
mp_rf <- model_performance(explainer_rf)
mp_lm
##        0%       10%       20%       30%       40%       50%       60%
## -472.3560 -423.9131 -398.2811 -370.8841  161.2473  174.0677  184.1412
##       70%       80%       90%      100%
##  195.8834  209.2460  221.4659  257.2555
mp_rf
##           0%          10%          20%          30%          40%
## -1262.554308  -408.920183  -197.591180   -89.661883    -7.454146
##          50%          60%          70%          80%          90%
##    55.441061   108.398858   157.924244   218.241574   294.264602
##         100%
##   727.445065

The generic plot() function shows reversed empirical cumulative distribution function for absolute values from residuals. This function presents a fraction of residuals larger than x. The figure below shows that majority of residuals for the random forest is smaller than residuals for the linear model, yet the small fraction of very large residuals affects the root mean square.

plot(mp_lm, mp_rf)

Use the geom = "boxplot" parameter for the generic plot() function to get an alternative comparison of residuals. The red dot stands for the root mean square.

plot(mp_lm, mp_rf, geom = "boxplot")