k-Nearest Neighbors model

This is a formula-based implementation of k-Nearest Neighbor. fm_knn() “fits” the model, which essentially amounts to saving a reformatted copy of the model data. The predict method finds the nearest neighbors of the points in newdata (using nn2() from package RANN) and returns their averages.

Usage

fm_knn(
  formula,
  data,
  k = 5,
  standardize = TRUE,
  weights = NULL,
  na.action = na.omit,
  ...
)

# S3 method for fm_knn
predict(object, newdata, k = object$k, ...)

Arguments

formula: A formula.
data: A data.frame.
k: Either a scalar integer, the number of neighbors, or alternatively a vector of (typically decreasing) non-negative numeric values. In the latter case, the model predictions will be weighted averages, with k[1] used as weight of the nearest neighbor, k[2] for the second nearest, etc.
standardize: Logical: Standardize the columns of the x, the design matrix? If TRUE, distances will be calculated after standardization using means and standard deviations from the x matrix.
weights: Weights used for averaging. Not used in the determination of neighbors.
na.action: A function which indicates what should happen when the data contain NAs. na.omit is the default, na.exclude or na.fail could be useful alternative settings.
...: Not used in fm_knn(). In predict.fm_knn(): passed to nn2().
object: Object of class “fm_knn”.
newdata: data.frame with the data to be predicted. If missing, predictions for the model data will be returned.

Value

fm_knn() returns a list of class “fm_knn” with components

formula: the formula;
x: the model matrix (resulting from the formula using model.matrix());
y: the vector of the response values;
k: the parameter k, the number of neighbors;
standardize: the parameter standardize, a logical value;
weights: the fitting weights;
xlevels: list of the levels of the factors included in the model;
na.action: the na.action used during data preparation;
contrasts: the contrasts used during data preparation;
call: the matched call generating the model.

Details

“Fitting” a model with fm_knn() essentially amounts to saving the (possibly standardized) model data.

Bindings in distances are currently not handled in a clever way, such that the predictions may depend on the order of the data.

Combining an argument k of length >1 and non-Null weights results in a mixture of two types of weights and is thus not recommended.

Methods (by generic)

predict(fm_knn): predict method for class “fm_knn”.

Examples

d <- simuldat()
nnmodel <- fm_knn(Y~ ., d)
nnmodel
#> k-nearest neighbors model (class ‘fm_knn’)
#>   formula:      Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + 
#>                     X10 + g - 1
#>   data:         d
#>   k:            5
#>   n:            500
#>   standardize:  TRUE
#>   weighted:     FALSE

# Predictions for new observations 
newd <- simuldat(n = 10)
data.frame(newd["Y"], 
           pred = predict(nnmodel, newdata = newd))
#>            Y     pred
#> 1   1.166769 4.275125
#> 2  -0.382027 3.363760
#> 3   3.995346 3.319309
#> 4   2.147231 4.006339
#> 5  -1.104314 1.872762
#> 6   1.901924 2.373253
#> 7   1.163698 4.679739
#> 8   7.857975 2.994145
#> 9   3.015759 3.390070
#> 10  5.449470 4.267375