Generate and cross-validate models resulting from adding or removing variables and stepwise procedures

step_extend() combines all models resulting from adding one variable to a base model into a multimodel and subjects it to cv(). step_forward() applies step_extend() repeatedly, selecting the best model with respect to test error at each step, thus performing a forward selection of variables.

step_reduce() combines all models resulting from removing one variable from a full model into a multimodel and subjects it to cv(). step_backward() applies step_reduce() repeatedly, selecting the best model w.r.t. test error at each step, thus performing a backward elimination of variables.

best_subset() combines submodels of the full model in a multimodel and subjects it to cv(). The desired range of the model sizes (number of effects) to include is specified in the parameter nvars.

Usage

step_extend(x, ...)

# S3 method for model
step_extend(
  x,
  formula1 = null_formula(x),
  formula2 = formula(x),
  steps = 1L,
  include_full = FALSE,
  include_base = FALSE,
  cv = TRUE,
  ...
)

# S3 method for default
step_extend(x, ...)

step_forward(x, ...)

# S3 method for model
step_forward(
  x,
  formula1 = null_formula(x),
  formula2 = formula(x),
  max_step = 10,
  include_base = TRUE,
  include_full = FALSE,
  nfold = getOption("cv_nfold"),
  folds = NULL,
  verbose = getOption("cv_verbose"),
  ...
)

# S3 method for default
step_forward(x, ...)

step_reduce(x, ...)

# S3 method for model
step_reduce(
  x,
  formula1 = null_formula(x),
  formula2 = formula(x),
  steps = 1L,
  include_full = FALSE,
  include_base = FALSE,
  cv = TRUE,
  ...
)

# S3 method for default
step_reduce(x, ...)

step_backward(x, ...)

# S3 method for model
step_backward(
  x,
  formula1 = null_formula(x),
  formula2 = formula(x),
  max_step = 10,
  include_full = TRUE,
  include_base = FALSE,
  nfold = getOption("cv_nfold"),
  folds = NULL,
  verbose = getOption("cv_verbose"),
  ...
)

# S3 method for default
step_backward(x, ...)

best_subset(x, ...)

# S3 method for model
best_subset(
  x,
  formula1 = null_formula(x),
  formula2 = formula(x),
  nvars = 1:5,
  include_base = any(nvars == 0),
  include_full = FALSE,
  cv = TRUE,
  ...
)

# S3 method for default
best_subset(x, ...)

Arguments

x: Object of class “model” or a fitted model.
...: Dots go to cv() in step_extend() and step_reduce() (provided cv=TRUE), and to tune() in step_forward() and step_backward().
formula1, formula2: Two nested model formulas defining the range of models to be considered. The larger of the two is taken as the full model, the simpler as the base model. See the “Details” section.
steps: (step_extend, step_reduce) Integer: Number of variables to add/remove. Default: 1.
include_full: Logical: Whether to include the full model in the output.
include_base: Logical: Whether to include the base model in the output.
cv: (step_extend, step_reduce, best_subset) Logical: Run cv or just return the multimodel?
max_step: (step_forward, step_backward) Integer: Maximal number of steps.
nfold, folds: Passed to make_folds.
verbose: Logical: Output information on execution progress in console?
nvars: (best_subset) Integer vector defining the number of variables.

Value

All of these functions return an object of class “cv”."

Details

formula1 formula2 must be nested model formulas, i.e. one of the two formulas must include all terms present in the other. They define the range of models to be considered: The larger of the two defines the full model, the other is taken as the base model.

By default, formula1 and formula2 are used to update the original model formula. Enclose a formula in I() to replace the model's formula. This distinction is relevant whenever you specify a formula including a dot. See the “Details” section and examples in ?update.model.

Examples

mod <- model(lm(Sepal.Length ~ ., iris), 
             label = "sepLen")
             
# Add variables to base model
oneVarModels <- step_extend(mod)
cv_performance(oneVarModels)
#> --- Performance table ---
#> Metric: rmse
#>                                   formula train_rmse test_rmse time_cv
#> +Sepal.Width  Sepal.Length ~ Sepal.Width     0.81906   0.82587   0.016
#> +Petal.Length Sepal.Length ~ Petal.Length    0.40397   0.40608   0.009
#> +Petal.Width  Sepal.Length ~ Petal.Width     0.47445   0.47636   0.009
#> +Species      Sepal.Length ~ Species         0.50881   0.51965   0.013

# step_forwamrd
cv_fwd <- step_forward(mod)
cv_performance(cv_fwd)
#> --- Performance table ---
#> Metric: rmse
#>                                                                         formula train_rmse test_rmse time_cv
#> base          Sepal.Length ~ 1                                                     0.82493   0.82184   0.008
#> +Petal.Length Sepal.Length ~ Petal.Length                                          0.40397   0.39934   0.009
#> +Sepal.Width  Sepal.Length ~ Petal.Length + Sepal.Width                            0.32950   0.32767   0.010
#> +Species      Sepal.Length ~ Petal.Length + Sepal.Width + Species                  0.30461   0.30610   0.016
#> +Petal.Width  Sepal.Length ~ Petal.Length + Sepal.Width + Species + Petal.Width    0.30003   0.30266   0.019

# Remove variables from full model
mod |> step_reduce() |> cv_performance()
#> --- Performance table ---
#> Metric: rmse
#>                                                               formula train_rmse test_rmse time_cv
#> -Sepal.Width  Sepal.Length ~ Petal.Length + Petal.Width + Species        0.33294   0.34157   0.015
#> -Petal.Length Sepal.Length ~ Sepal.Width + Petal.Width + Species         0.42566   0.43186   0.015
#> -Petal.Width  Sepal.Length ~ Sepal.Width + Petal.Length + Species        0.30451   0.31367   0.015
#> -Species      Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width    0.30977   0.31694   0.011
mod |> step_backward() |> cv_performance()
#> --- Performance table ---
#> Metric: rmse
#>                                                                         formula train_rmse test_rmse time_cv
#> full          Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species    0.29985   0.30990   0.016
#> -Petal.Width  Sepal.Length ~ Sepal.Width + Petal.Length + Species                  0.30441   0.31375   0.015
#> -Species      Sepal.Length ~ Sepal.Width + Petal.Length                            0.32955   0.33093   0.009
#> -Sepal.Width  Sepal.Length ~ Petal.Length                                          0.40403   0.39967   0.009
#> -Petal.Length Sepal.Length ~ 1                                                     0.82512   0.82060   0.007

# best subset
mod |> best_subset(nvar = 2:3) |> cv_performance()
#> --- Performance table ---
#> Metric: rmse
#>                                                                                       formula train_rmse test_rmse time_cv
#> +Sepal.Width+Petal.Length             Sepal.Length ~ Sepal.Width + Petal.Length                  0.32947   0.33340   0.010
#> +Sepal.Width+Petal.Width              Sepal.Length ~ Sepal.Width + Petal.Width                   0.44595   0.44860   0.010
#> +Sepal.Width+Species                  Sepal.Length ~ Sepal.Width + Species                       0.43110   0.43481   0.015
#> +Petal.Length+Petal.Width             Sepal.Length ~ Petal.Length + Petal.Width                  0.39830   0.40975   0.010
#> +Petal.Length+Species                 Sepal.Length ~ Petal.Length + Species                      0.33272   0.34273   0.014
#> +Petal.Width+Species                  Sepal.Length ~ Petal.Width + Species                       0.47344   0.48899   0.014
#> +Sepal.Width+Petal.Length+Petal.Width Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width    0.30957   0.31965   0.011
#> +Sepal.Width+Petal.Length+Species     Sepal.Length ~ Sepal.Width + Petal.Length + Species        0.30441   0.31458   0.015
#> +Sepal.Width+Petal.Width+Species      Sepal.Length ~ Sepal.Width + Petal.Width + Species         0.42568   0.43661   0.015
#> +Petal.Length+Petal.Width+Species     Sepal.Length ~ Petal.Length + Petal.Width + Species        0.33257   0.34553   0.015