Selection of the best-performing model in a “cv” object

tune() picks the “best” model from a set of models, that is the model with the smallest test error.

Usage

tune(x, ...)

# S3 method for cv
tune(x, metric = x$metric[1], label = NULL, ...)

# S3 method for multimodel
tune(
  x,
  nfold = getOption("cv_nfold"),
  folds = NULL,
  metric = NULL,
  label = NULL,
  ...
)

# S3 method for model
tune(
  x,
  ...,
  nfold = getOption("cv_nfold"),
  folds = NULL,
  metric = NULL,
  expand = TRUE,
  max_n_model = getOption("expand_max_model"),
  label = NULL,
  verbose = getOption("cv_verbose")
)

# S3 method for default
tune(
  x,
  ...,
  nfold = getOption("cv_nfold"),
  folds = NULL,
  metric = NULL,
  expand = TRUE,
  max_n_model = getOption("expand_max_model"),
  verbose = getOption("cv_verbose"),
  use_original_args = FALSE,
  force = FALSE
)

Arguments

x: An object of class “cv”, or any other object that can be transformed to a cv.
...: See “Details” and “Methods”.
metric: A metric (see metrics), specified either as a character string (name of the metric function), or as a named list of length 1, as in list(rmse = rmse). metric=NULL selects the default metric, see default_metric.
label: A character string: The label of the output model. If NULL, the selected model's current label is kept.
nfold, folds: Passed to make_folds().
expand: Logical: Expand the “...” arguments (default) or join them element-wise? If expand=TRUE, the vectors in “...” will be expanded, the number of models will equal the product of the lengths of the “...” arguments; otherwise, all “...” arguments must have equal lengths, and the number of models will be equal to their common length.
max_n_model: Maximal number of models that will be included. If the number of specified models is greater than max_n_model, a subset will be selected at random.
verbose: Passed to cv().
use_original_args, force: Arguments passed to fit().

Value

All methods return an object of class “model”, except tune.default(), which returns a fitted model of the same class as its input x.

Details

The core method is tune.cv(x), that selects the model from those included in x having lowest test error (in terms of the metric). All other methods will run tune.cv() after some preprocessing steps -- see section “Methods”.

For iteratively fitted models (ifm, classes “fm_xgb” and “fm_glmnet”), tune(x) without additional arguments returns the model corresponding to the preferred iteration, thus tuning the parameter nrounds or lambda, respectively. For other models, tune(x) simply returns x.

Note the different role of the “...” arguments in different methods:

In tune.cv() and tune.multimodel() they are passed to cv_performance(), allowing specification of eval_weights;
In tune.model() and tune.default(), they are passed to multimodel(). This allows expanding a selection of alternative parameterizations, of which that with the smallest test error is finally returned. Selecting the best-performing parameterization thus reduces to one simple line of code:
tune(mymodel, hyperparms=candidate_values)

Methods

tune.cv: Executes cv_performance(x, ...), finds the model with minimal test error and extracts that model (using extract_model).
tune.multimodel(x, ...) essentially runs x %>% cv %>% tune(...).
tune.model(x, ...) essentially runs x %>% multimodel(...) %>% cv %>% tune.
tune.default(x, ...) is applied to a fitted model x and essentially runs x %>% model %>% multimodel(...) %>% cv %>% tune %>% fit.

Examples

options(cv_nfold = 1/3)  # accelerates cross-validations, see ?modeltuner_options

# Tune xgboost parameter:
model_xgb <- model("fm_xgb", Sepal.Length~., iris, class = "fm_xgb")
mm_xgb <- multimodel(model_xgb, max_depth = 1:5)
cv_xgb <- cv(mm_xgb)
plot(cv_performance(cv_xgb), 
     xvar = "max_depth", zeroline = FALSE)


# tune() automatically selects the best-performing model:
tuned1 <- tune(mm_xgb, max_depth = 1:5, label = "tuned_model_1")
tuned1
#> --- A “model” object ---
#>   label:          tuned_model_1
#>   model class:    fm_xgb
#>   formula:        Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + 
#>                       Species
#>   data:           data.frame [150 x 5], 
#>                   input as: ‘data = iris’
#>   response_type:  continuous
#>   call:           fm_xgb(formula = Sepal.Length ~ ., data = data, 
#>                       max_depth = 4L)
#> Preferred iteration from cv:  iter=17
# \donttest{
# Several tuning steps in a pipe:
tuned2 <- tuned1 |> 
  tune(learning_rate = c(0.1, 0.3, 1)) |> 
  tune(min_child_weight = c(5, 10, 20), label = "tuned_model_2")
fit(tuned2, eval = FALSE, use_original_args = TRUE)  # extract selected model
#> set_pref_iter(), model ‘tuned_model_2’, modifications made in call:
#>   pref_iter=49, nrounds=49, early_stopping_rounds=NULL
#> fm_xgb(formula = Sepal.Length ~ ., data = iris, nrounds = 49L, 
#>     early_stopping_rounds = NULL, pref_iter = 49L, max_depth = 4L, 
#>     learning_rate = 0.1, min_child_weight = 10)
# }
# Alternatively test a number of random parameterizations
tuned3 <- tune(tuned1, learning_rate = c(0.1, 0.3, 1), 
               min_child_weight = c(5, 10, 20), 
               label = "tuned_model_3")
fit(tuned3, eval = FALSE, use_original_args = TRUE)  # extract selected model
#> set_pref_iter(), model ‘tuned_model_3’, modifications made in call:
#>   pref_iter=56, nrounds=56, early_stopping_rounds=NULL
#> fm_xgb(formula = Sepal.Length ~ ., data = iris, nrounds = 56L, 
#>     early_stopping_rounds = NULL, pref_iter = 56L, max_depth = 4L, 
#>     learning_rate = 0.1, min_child_weight = 5)