Skip to contents

This is a formula-based implementation of k-Nearest Neighbor. fm_knn() “fits” the model, which essentially amounts to saving a reformatted copy of the model data. The predict method finds the nearest neighbors of the points in newdata (using nn2() from package RANN) and returns their averages.

Usage

fm_knn(
  formula,
  data,
  k = 5,
  standardize = TRUE,
  weights = NULL,
  na.action = na.omit,
  ...
)

# S3 method for fm_knn
predict(object, newdata, k = object$k, ...)

Arguments

formula

A formula.

data

A data.frame.

k

Either a scalar integer, the number of neighbors, or alternatively a vector of (typically decreasing) non-negative numeric values. In the latter case, the model predictions will be weighted averages, with k[1] used as weight of the nearest neighbor, k[2] for the second nearest, etc.

standardize

Logical: Standardize the columns of the x, the design matrix? If TRUE, distances will be calculated after standardization using means and standard deviations from the x matrix.

weights

Weights used for averaging. Not used in the determination of neighbors.

na.action

A function which indicates what should happen when the data contain NAs. na.omit is the default, na.exclude or na.fail could be useful alternative settings.

...

Not used in fm_knn(). In predict.fm_knn(): passed to nn2().

object

Object of class “fm_knn”.

newdata

data.frame with the data to be predicted. If missing, predictions for the model data will be returned.

Value

fm_knn() returns a list of class “fm_knn” with components

  • formula: the formula;

  • x: the model matrix (resulting from the formula using model.matrix());

  • y: the vector of the response values;

  • k: the parameter k, the number of neighbors;

  • standardize: the parameter standardize, a logical value;

  • weights: the fitting weights;

  • xlevels: list of the levels of the factors included in the model;

  • na.action: the na.action used during data preparation;

  • contrasts: the contrasts used during data preparation;

  • call: the matched call generating the model.

Details

“Fitting” a model with fm_knn() essentially amounts to saving the (possibly standardized) model data.

Bindings in distances are currently not handled in a clever way, such that the predictions may depend on the order of the data.

Combining an argument k of length >1 and non-Null weights results in a mixture of two types of weights and is thus not recommended.

Methods (by generic)

  • predict(fm_knn): predict method for class “fm_knn”.

See also

nn2 (package RANN)

Examples

d <- simuldat()
nnmodel <- fm_knn(Y~ ., d)
nnmodel
#> k-nearest neighbors model (class ‘fm_knn’)
#>   formula:      Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + 
#>                     X10 + g - 1
#>   data:         d
#>   k:            5
#>   n:            500
#>   standardize:  TRUE
#>   weighted:     FALSE

# Predictions for new observations 
newd <- simuldat(n = 10)
data.frame(newd["Y"], 
           pred = predict(nnmodel, newdata = newd))
#>            Y     pred
#> 1   1.166769 4.275125
#> 2  -0.382027 3.363760
#> 3   3.995346 3.319309
#> 4   2.147231 4.006339
#> 5  -1.104314 1.872762
#> 6   1.901924 2.373253
#> 7   1.163698 4.679739
#> 8   7.857975 2.994145
#> 9   3.015759 3.390070
#> 10  5.449470 4.267375