Selection of the best-performing model in a “cv” object
tune.Rd
tune()
picks the “best” model from a set of models, that is the model with
the smallest test error.
Usage
tune(x, ...)
# S3 method for cv
tune(x, metric = x$metric[1], label = NULL, ...)
# S3 method for multimodel
tune(
x,
nfold = getOption("cv_nfold"),
folds = NULL,
metric = NULL,
label = NULL,
...
)
# S3 method for model
tune(
x,
...,
nfold = getOption("cv_nfold"),
folds = NULL,
metric = NULL,
expand = TRUE,
max_n_model = getOption("expand_max_model"),
label = NULL,
verbose = getOption("cv_verbose")
)
# S3 method for default
tune(
x,
...,
nfold = getOption("cv_nfold"),
folds = NULL,
metric = NULL,
expand = TRUE,
max_n_model = getOption("expand_max_model"),
verbose = getOption("cv_verbose"),
use_original_args = FALSE,
force = FALSE
)
Arguments
- x
An object of class “cv”, or any other object that can be transformed to a
cv
.- ...
See “Details” and “Methods”.
- metric
A metric (see
metrics
), specified either as a character string (name of the metric function), or as a named list of length 1, as inlist(rmse = rmse)
.metric=NULL
selects the default metric, seedefault_metric
.- label
A character string: The label of the output model. If
NULL
, the selected model's current label is kept.- nfold, folds
Passed to
make_folds()
.- expand
Logical: Expand the “...” arguments (default) or join them element-wise? If
expand=TRUE
, the vectors in “...” will be expanded, the number of models will equal the product of the lengths of the “...” arguments; otherwise, all “...” arguments must have equal lengths, and the number of models will be equal to their common length.- max_n_model
Maximal number of models that will be included. If the number of specified models is greater than
max_n_model
, a subset will be selected at random.- verbose
Passed to
cv()
.- use_original_args, force
Arguments passed to
fit()
.
Value
All methods return an object of class “model”, except tune.default()
,
which returns a fitted model of the same class as its input x
.
Details
The core method is tune.cv(x)
, that selects the model from those included in x
having lowest test error
(in terms of the metric
).
All other methods will run tune.cv()
after some preprocessing steps -- see section “Methods”.
For iteratively fitted models (ifm, classes “fm_xgb” and “fm_glmnet”),
tune(x)
without additional arguments returns the model corresponding to the preferred iteration, thus
tuning the parameter nrounds
or lambda
, respectively.
For other models, tune(x)
simply returns x
.
Note the different role of the “...” arguments in different methods:
In
tune.cv()
andtune.multimodel()
they are passed tocv_performance()
, allowing specification ofeval_weights
;In
tune.model()
andtune.default()
, they are passed tomultimodel()
. This allows expanding a selection of alternative parameterizations, of which that with the smallest test error is finally returned. Selecting the best-performing parameterization thus reduces to one simple line of code:tune(mymodel, hyperparms=candidate_values)
Methods
tune.cv
: Executescv_performance(x, ...)
, finds the model with minimal test error and extracts that model (usingextract_model
).tune.multimodel(x, ...)
essentially runsx %>% cv %>% tune(...)
.tune.model(x, ...)
essentially runsx %>% multimodel(...) %>% cv %>% tune
.tune.default(x, ...)
is applied to a fitted modelx
and essentially runsx %>% model %>% multimodel(...) %>% cv %>% tune %>% fit
.
Examples
options(cv_nfold = 1/3) # accelerates cross-validations, see ?modeltuner_options
# Tune xgboost parameter:
model_xgb <- model("fm_xgb", Sepal.Length~., iris, class = "fm_xgb")
mm_xgb <- multimodel(model_xgb, max_depth = 1:5)
cv_xgb <- cv(mm_xgb)
plot(cv_performance(cv_xgb),
xvar = "max_depth", zeroline = FALSE)
# tune() automatically selects the best-performing model:
tuned1 <- tune(mm_xgb, max_depth = 1:5, label = "tuned_model_1")
tuned1
#> --- A “model” object ---
#> label: tuned_model_1
#> model class: fm_xgb
#> formula: Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width +
#> Species
#> data: data.frame [150 x 5],
#> input as: ‘data = iris’
#> response_type: continuous
#> call: fm_xgb(formula = Sepal.Length ~ ., data = data,
#> max_depth = 4L)
#> Preferred iteration from cv: iter=17
# \donttest{
# Several tuning steps in a pipe:
tuned2 <- tuned1 |>
tune(learning_rate = c(0.1, 0.3, 1)) |>
tune(min_child_weight = c(5, 10, 20), label = "tuned_model_2")
fit(tuned2, eval = FALSE, use_original_args = TRUE) # extract selected model
#> set_pref_iter(), model ‘tuned_model_2’, modifications made in call:
#> pref_iter=49, nrounds=49, early_stopping_rounds=NULL
#> fm_xgb(formula = Sepal.Length ~ ., data = iris, nrounds = 49L,
#> early_stopping_rounds = NULL, pref_iter = 49L, max_depth = 4L,
#> learning_rate = 0.1, min_child_weight = 10)
# }
# Alternatively test a number of random parameterizations
tuned3 <- tune(tuned1, learning_rate = c(0.1, 0.3, 1),
min_child_weight = c(5, 10, 20),
label = "tuned_model_3")
fit(tuned3, eval = FALSE, use_original_args = TRUE) # extract selected model
#> set_pref_iter(), model ‘tuned_model_3’, modifications made in call:
#> pref_iter=56, nrounds=56, early_stopping_rounds=NULL
#> fm_xgb(formula = Sepal.Length ~ ., data = iris, nrounds = 56L,
#> early_stopping_rounds = NULL, pref_iter = 56L, max_depth = 4L,
#> learning_rate = 0.1, min_child_weight = 5)