This function selects subset of rows from data set. This is usefull if data is large and we need just a sample to calculate profiles.

select_neighbours(data, observation, variables = NULL,
  distance = gower::gower_dist, n = 20, frac = NULL)

Arguments

data

set of observations

observation

single observation

variables

variables that shall be used for calculation of distance. By default these are all variables present in `data` and `observation`

distance

distance function, by default the `gower_dist` function.

n

number of neighbours to select

frac

if `n` is not specified (NULL), then will be calculated as `frac` * number of rows in `data`. Either `n` or `frac` need to be specified.

Value

a data frame with selected rows

Details

Note that select_neighbours function is S3 generic. If you want to work on non standard data sources (like H2O ddf, external databases) you should overload it.

Examples

library("DALEX") new_apartment <- apartments[1, 2:6] small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10) new_apartment
#> construction.year surface floor no.rooms district #> 1 1953 25 3 1 Srodmiescie
small_apartments
#> m2.price construction.year surface floor no.rooms district #> 2285 5875 1970 27 3 1 Srodmiescie #> 1073 5886 1960 36 2 1 Srodmiescie #> 8110 5614 1957 44 4 1 Srodmiescie #> 9527 6080 1947 27 1 1 Srodmiescie #> 3261 5859 1945 39 2 1 Srodmiescie #> 4309 5794 1947 31 3 2 Srodmiescie #> 1198 5821 1947 43 2 1 Srodmiescie #> 6647 5952 1938 30 2 1 Srodmiescie #> 4027 6457 1926 29 3 1 Srodmiescie #> 2655 5596 1950 25 6 1 Srodmiescie