This function calculate candidate splits for each selected variable. For numerical variables splits are calculated as percentiles (in general uniform quantiles of the length grid_points). For all other variables splits are calculated as unique values.
calculate_variable_splits(data, variables = colnames(data), grid_points = 101)
data | validation dataset. Is used to determine distribution of observations. |
---|---|
variables | names of variables for which splits shall be calculated |
grid_points | number of points used for response path |
A named list with splits for selected variables
Note that calculate_variable_splits
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
library("DALEX")library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") calculate_variable_splits(apartments, vars)#> $construction.year #> 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% #> 1920.00 1921.00 1922.00 1923.00 1924.00 1925.00 1926.00 1927.00 1927.00 1928.00 #> 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% #> 1929.00 1929.89 1931.00 1932.00 1933.00 1934.00 1934.84 1935.00 1936.00 1937.00 #> 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% #> 1938.00 1939.00 1940.00 1941.00 1941.76 1943.00 1944.00 1945.00 1946.00 1946.00 #> 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% #> 1947.00 1948.00 1949.00 1949.00 1950.00 1951.00 1952.00 1953.00 1954.00 1956.00 #> 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% #> 1956.00 1957.00 1957.00 1958.00 1959.00 1959.00 1960.00 1961.53 1962.52 1963.51 #> 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% #> 1965.00 1965.00 1967.00 1967.47 1968.00 1969.00 1970.00 1971.00 1971.00 1972.00 #> 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% #> 1973.00 1974.39 1975.00 1977.00 1977.00 1978.00 1980.00 1981.00 1982.00 1983.00 #> 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% #> 1984.00 1985.00 1985.00 1986.00 1987.00 1988.00 1989.00 1990.00 1990.00 1991.00 #> 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% #> 1992.00 1993.00 1993.18 1994.00 1995.00 1996.00 1997.00 1997.00 1998.00 1999.00 #> 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% #> 2000.00 2001.00 2002.00 2003.00 2004.00 2005.00 2005.00 2007.00 2008.00 2009.00 #> 100% #> 2010.00 #> #> $surface #> 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% #> 20.00 22.00 23.98 24.00 25.00 26.00 27.00 30.00 31.00 32.00 33.00 #> 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% #> 35.00 37.00 38.00 39.00 40.00 41.00 42.00 44.00 46.00 47.00 48.00 #> 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32% #> 49.78 51.00 52.00 53.00 54.00 55.73 56.72 57.00 59.00 60.00 61.00 #> 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% #> 62.00 63.00 64.00 65.64 67.00 68.62 70.00 72.00 73.59 75.00 76.00 #> 44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% #> 78.00 79.55 81.00 82.00 83.00 84.00 85.50 87.00 88.48 90.00 91.00 #> 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65% #> 93.00 93.44 95.00 96.00 98.00 100.00 101.39 103.00 105.00 105.00 106.00 #> 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% #> 107.00 108.00 109.00 110.00 111.00 112.00 113.00 114.27 116.00 118.00 120.00 #> 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% #> 121.00 123.00 124.00 125.00 127.00 127.00 128.00 130.00 131.00 132.00 133.00 #> 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% #> 136.00 137.00 138.00 141.00 142.00 143.00 144.00 145.00 146.04 147.03 148.00 #> 99% 100% #> 149.00 150.00 #> #> $floor #> 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% #> 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.91 2.00 2.00 2.00 #> 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% #> 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 3.00 3.00 3.00 3.00 3.00 #> 26% 27% 28% 29% 30% 31% 32% 33% 34% 35% 36% 37% 38% #> 3.00 3.00 3.00 3.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 5.00 #> 39% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 50% 51% #> 5.00 5.00 5.00 5.00 5.00 5.00 5.00 5.00 5.00 6.00 6.00 6.00 6.00 #> 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% #> 6.00 6.00 6.00 6.00 6.00 6.00 7.00 7.00 7.00 7.00 7.00 7.00 7.00 #> 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% #> 7.00 7.00 7.00 7.00 8.00 8.00 8.00 8.00 8.00 8.00 8.00 8.00 8.00 #> 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% #> 8.00 9.00 9.00 9.00 9.00 9.00 9.00 9.00 9.00 9.00 9.00 9.00 10.00 #> 91% 92% 93% 94% 95% 96% 97% 98% 99% 100% #> 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 #> #> $no.rooms #> 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% #> 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 #> 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% #> 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 #> 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44% 45% 46% 47% #> 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 #> 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% #> 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 #> 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% #> 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 #> 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% #> 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 #> 96% 97% 98% 99% 100% #> 6 6 6 6 6 #> #> $district #> [1] Srodmiescie Bielany Praga Ochota Mokotow Ursus #> [7] Zoliborz Wola Bemowo Ursynow #> 10 Levels: Bemowo Bielany Mokotow Ochota Praga Srodmiescie Ursus ... Zoliborz #>