k-Nearest Neighbors model
fm_knn.Rd
This is a formula
-based implementation of k-Nearest Neighbor.
fm_knn()
“fits” the model, which essentially amounts to
saving a reformatted copy of the model data.
The predict
method finds the nearest neighbors of the points in newdata
(using nn2()
from package RANN) and returns their averages.
Usage
fm_knn(
formula,
data,
k = 5,
standardize = TRUE,
weights = NULL,
na.action = na.omit,
...
)
# S3 method for fm_knn
predict(object, newdata, k = object$k, ...)
Arguments
- formula
A
formula
.- data
A
data.frame
.- k
Either a scalar integer, the number of neighbors, or alternatively a vector of (typically decreasing) non-negative numeric values. In the latter case, the model predictions will be weighted averages, with
k[1]
used as weight of the nearest neighbor,k[2]
for the second nearest, etc.- standardize
Logical: Standardize the columns of the
x
, the design matrix? IfTRUE
, distances will be calculated after standardization using means and standard deviations from thex
matrix.- weights
Weights used for averaging. Not used in the determination of neighbors.
- na.action
A function which indicates what should happen when the data contain
NA
s.na.omit
is the default,na.exclude
orna.fail
could be useful alternative settings.- ...
Not used in
fm_knn()
. Inpredict.fm_knn()
: passed tonn2()
.- object
Object of class “fm_knn”.
- newdata
data.frame
with the data to be predicted. If missing, predictions for the model data will be returned.
Value
fm_knn()
returns a list of class “fm_knn” with components
formula: the formula;
x: the model matrix (resulting from the
formula
usingmodel.matrix()
);y: the vector of the response values;
k: the parameter
k
, the number of neighbors;standardize: the parameter
standardize
, a logical value;weights: the fitting weights;
xlevels: list of the levels of the factors included in the model;
na.action: the
na.action
used during data preparation;contrasts: the
contrasts
used during data preparation;call: the matched call generating the model.
Details
“Fitting” a model with fm_knn()
essentially amounts to saving the (possibly standardized) model data.
Bindings in distances are currently not handled in a clever way, such that the predictions may depend on the order of the data.
Combining an argument k
of length >1
and non-Null weights
results in a mixture of two types of weights and
is thus not recommended.
Examples
d <- simuldat()
nnmodel <- fm_knn(Y~ ., d)
nnmodel
#> k-nearest neighbors model (class ‘fm_knn’)
#> formula: Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 +
#> X10 + g - 1
#> data: d
#> k: 5
#> n: 500
#> standardize: TRUE
#> weighted: FALSE
# Predictions for new observations
newd <- simuldat(n = 10)
data.frame(newd["Y"],
pred = predict(nnmodel, newdata = newd))
#> Y pred
#> 1 1.166769 4.275125
#> 2 -0.382027 3.363760
#> 3 3.995346 3.319309
#> 4 2.147231 4.006339
#> 5 -1.104314 1.872762
#> 6 1.901924 2.373253
#> 7 1.163698 4.679739
#> 8 7.857975 2.994145
#> 9 3.015759 3.390070
#> 10 5.449470 4.267375