Calculate train and test errors based on cross-validation.

cv_performance returns a performance table, a summary table of training and test errors for the models included in the main argument x, of class “performance”. cv_performance.cv() is the core method. All other methods run this method after some preprocessing steps – see section “Methods”.

The method plot.performance() generates a graphical display of the performance values in the “performance” object x, a bar chart by default, alternatively a line plot (depending on parameter xvar).

Usage

cv_performance(x, ...)

# S3 method for cv
cv_performance(
  x,
  metric = x$metric[1],
  eval_weights = "default",
  na.rm = FALSE,
  param = TRUE,
  ...
)

# S3 method for model
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)

# S3 method for multimodel
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)

# S3 method for default
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)

# S3 method for performance
print(
  x,
  se = getOption("cv_show_se"),
  n = getOption("print_max_row"),
  digits = 5,
  param = TRUE,
  ...
)

# S3 method for performance
plot(
  x,
  xvar = "model",
  errorbars = getOption("cv_show_se"),
  plot = TRUE,
  size = 2,
  lwd = 1,
  lwd_errorbars = 0.5,
  zeroline = TRUE,
  alpha = 0.3,
  ...
)

Arguments

x: “cv” object, or object of another class.
...: Passed to the metric function.
metric: A metric (see metrics), specified either as a character string (name of the metric function), or as a named list of length 1, as in list(rmse = rmse). metric=NULL selects the default metric, see default_metric.
eval_weights: Evaluation weights; see the “Evaluation weights” in the “Details” section of ?modeltuner. "eval_weights=default means “use fitting weights” while "eval_weights=NULL means unweighted evaluation.
na.rm: Logical: Whether NA values should be excluded from computations.
param: Logical: Keep parameters from the parameter table in the output?
se: Logical: Show standard errors?
n: Integer: Maximal number of rows to print.
digits: Integer: Number of digits to print.
xvar: If xvar if not specified (default), a bar plot is drawn. Alternatively, a line plot is generated, with xvar as the variable on x axis. xvar should be a character string, the name of a numeric variable in the performance table x. Typically, the xvar is some hyperparameter varying across models.
errorbars: Logical: Whether to add error bars in plots.
plot: Logical: If TRUE, a ggplot is returned, if FALSE a data.frame. plot() first prepares a data.frame and then draws some ggplot using this data, with limited options for customization. If you want to design your own plot, you can set plot=FALSE, and use the data.frame returned by plot() to create your plot.
size: Graphic detail: Size of point.
lwd: Graphic detail: Line width of interpolating line.
lwd_errorbars: Graphic detail: Line width of errorbars.
zeroline: Logical: Whether to include a horizontal reference line at level 0.
alpha: Graphic detail: Opacity of bars.

Value

cv_performance() returns a performance table. This is a param_table with additional class “performance” having some additional information stored in its attributes. Each row corresponds to a model. It has columns train_metric and test_metric

(e.g., train_rmse and test_rmse), se_train_metric and se_test_metric, time_cv (execution time of the cross-validation), and possibly more columns being part of the parameter table of the multimodel (see “Details” section in multimodel).

Details

While different models in a “cv” object can have different metrics, cv_performance() always reports the same metric for all models. If metric is not specified in the call to cv_performance(), the metric from the first model will be chosen (see default_metric). If cv_performance() is applied to a “cv” object including models having different default weights (and weights are not given explicitly), cv_performance() will use eval_weights=NULL.

Details on evaluation:
For each fold, the evaluation metric is calculated separately for the sets of training and test observations, yielding \(k\) pairs \((train\_err_i, test\_err_i)\), \(i=1, \ldots, k\), where \(k\) is the number of folds. cv_performance() reports the average of the \(train\_err_i\), \(i=1, \ldots, k\), as the training error and the average of the \(test\_err_i\) as the test error. In case of non-NULL eval_weights, weighted averages are calculated with group weights computed as the group-wise sums of the observations weights.

Standard errors of the reported errors are only printed if you set se=TRUE when printing the performance table, which is not the case by default (see the option cv_show_se, cf. modeltuner_options). These standard errors are only reported if the number of folds is >1. Their computation is based on the assumption of perfect independence of all residuals, and may thus be very unreliable. As a rough guide, the standard errors are reasonable in case of a model with just a few free parameters and many observations, while they will severely underestimate the actual uncertainty in the case of models of high structural complexity.

Methods

cv_performance.cv() is the core method described above. It uses the first cross-validated model's metric as default metric.
cv_performance.model(x, ...) executes x %>% cv %>% cv_performance(...).
cv_performance.multimodel(x, ...) executes x %>% cv %>% cv_performance(...). Its (implicit) default metric is default_metric(x).
cv_performance.default(x, ...) executes x %>% model %>% cv %>% cv_performance(...), where x is a fitted model.

Examples

# iris data: compare several model approaches
mm <- c(
  lm = model(lm(Sepal.Length ~., iris)), 
  lm2 = model(lm(Sepal.Length ~.^2, iris)), 
  glmnet = model(fm_glmnet(Sepal.Length ~., iris)), 
  glmnet2 = model(fm_glmnet(Sepal.Length ~.^2, iris)), 
  fm_xgb = model(fm_xgb(Sepal.Length ~., iris)))
cvobj <- cv(mm, nfold = 5)

# performance
cvperm <- cv_performance(cvobj)
cvperm
#> --- Performance table ---
#> Metric: rmse
#>         train_rmse test_rmse iteration time_cv
#> lm         0.29880   0.31593        NA   0.008
#> lm2        0.27958   0.31852        NA   0.008
#> glmnet     0.29887   0.31597        78   0.078
#> glmnet2    0.29197   0.31143        65   0.098
#> fm_xgb     0.15151   0.34083        16   0.066

# Sort by test error
sort_models(cvperm, by = "test")
#> --- Performance table ---
#> Metric: rmse
#>         train_rmse test_rmse iteration time_cv
#> glmnet2    0.29197   0.31143        65   0.098
#> lm         0.29880   0.31593        NA   0.008
#> glmnet     0.29887   0.31597        78   0.078
#> lm2        0.27958   0.31852        NA   0.008
#> fm_xgb     0.15151   0.34083        16   0.066

#' print performance table with estimated standard errors (unreliable!)
print(cvperm, se = TRUE)
#> --- Performance table ---
#> Metric: rmse
#>         train_rmse test_rmse se_train_rmse se_test_rmse iteration time_cv
#> lm         0.29880   0.31593     0.0089663    0.0108579        NA   0.008
#> lm2        0.27958   0.31852     0.0118965    0.0122936        NA   0.008
#> glmnet     0.29887   0.31597     0.0089655    0.0109235        78   0.078
#> glmnet2    0.29197   0.31143     0.0079191    0.0106191        65   0.098
#> fm_xgb     0.15151   0.34083     0.0091666    0.0095654        16   0.066
#> The reported standard errors may be inaccurate.