Calculate train and test errors based on cross-validation.
cv_performance.Rdcv_performance returns a performance table, a summary table of training and test errors for the models included in the
main argument x, of class “performance”.
cv_performance.cv() is the core method. All other methods run this method after some preprocessing steps –
see section “Methods”.
The method plot.performance() generates a graphical display of the performance values in the “performance” object x,
a bar chart by default, alternatively a line plot (depending on parameter xvar).
Usage
cv_performance(x, ...)
# S3 method for cv
cv_performance(
  x,
  metric = x$metric[1],
  eval_weights = "default",
  na.rm = FALSE,
  param = TRUE,
  ...
)
# S3 method for model
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)
# S3 method for multimodel
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)
# S3 method for default
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)
# S3 method for performance
print(
  x,
  se = getOption("cv_show_se"),
  n = getOption("print_max_row"),
  digits = 5,
  param = TRUE,
  ...
)
# S3 method for performance
plot(
  x,
  xvar = "model",
  errorbars = getOption("cv_show_se"),
  plot = TRUE,
  size = 2,
  lwd = 1,
  lwd_errorbars = 0.5,
  zeroline = TRUE,
  alpha = 0.3,
  ...
)Arguments
- x
- “cv” object, or object of another class. 
- ...
- Passed to the - metricfunction.
- metric
- A metric (see - metrics), specified either as a character string (name of the metric function), or as a named list of length 1, as in- list(rmse = rmse).- metric=NULLselects the default metric, see- default_metric.
- eval_weights
- Evaluation weights; see the “Evaluation weights” in the “Details” section of - ?modeltuner.- "eval_weights=defaultmeans “use fitting weights” while- "eval_weights=NULLmeans unweighted evaluation.
- na.rm
- Logical: Whether NA values should be excluded from computations. 
- param
- Logical: Keep parameters from the parameter table in the output? 
- se
- Logical: Show standard errors? 
- n
- Integer: Maximal number of rows to print. 
- digits
- Integer: Number of digits to print. 
- xvar
- If - xvarif not specified (default), a bar plot is drawn. Alternatively, a line plot is generated, with- xvaras the variable on x axis.- xvarshould be a character string, the name of a numeric variable in the performance table- x. Typically, the- xvaris some hyperparameter varying across models.
- errorbars
- Logical: Whether to add error bars in plots. 
- plot
- Logical: If - TRUE, a ggplot is returned, if- FALSEa- data.frame.- plot()first prepares a- data.frameand then draws some ggplot using this data, with limited options for customization. If you want to design your own plot, you can set- plot=FALSE, and use the- data.framereturned by- plot()to create your plot.
- size
- Graphic detail: Size of point. 
- lwd
- Graphic detail: Line width of interpolating line. 
- lwd_errorbars
- Graphic detail: Line width of errorbars. 
- zeroline
- Logical: Whether to include a horizontal reference line at level 0. 
- alpha
- Graphic detail: Opacity of bars. 
Value
cv_performance() returns a performance table.
This is a param_table with additional class “performance” having some additional information stored in its attributes.
Each row corresponds to a model. It has columns train_metric and test_metric
(e.g., train_rmse and test_rmse), se_train_metric and se_test_metric,
time_cv (execution time of the cross-validation), and possibly
more columns being part of the parameter table of the multimodel (see “Details” section in multimodel).
Details
While different models in a “cv” object can have different metrics, cv_performance() always reports the same metric
for all models.
If metric is not specified in the call to cv_performance(), the metric from the first model will be chosen
(see default_metric).
If cv_performance() is applied to a “cv” object including models having different default weights
(and weights are not given explicitly), cv_performance() will use eval_weights=NULL.
Details on evaluation: 
For each fold, the evaluation metric is calculated separately for the sets of training and test observations,
yielding \(k\) pairs \((train\_err_i, test\_err_i)\), \(i=1, \ldots, k\), where \(k\) is the number of folds.
cv_performance() reports the average of the \(train\_err_i\), \(i=1, \ldots, k\), as the training error and
the average of the \(test\_err_i\) as the test error.
In case of non-NULL eval_weights, weighted averages are calculated with group weights computed as
the group-wise sums of the observations weights.
Standard errors of the reported errors are only printed if you set se=TRUE when printing the performance table,
which is not the case by default (see the option cv_show_se, cf. modeltuner_options).
These standard errors are only reported if the number of folds is >1.
Their computation is based on the assumption of perfect independence of all residuals, and may thus be very unreliable.
As a rough guide, the standard errors are reasonable in case of a model with just a few free parameters and many observations,
while they will severely underestimate the actual uncertainty in the case of models of high structural complexity.
Methods
- cv_performance.cv()is the core method described above. It uses the first cross-validated model's metric as default metric.
- cv_performance.model(x, ...)executes- x %>% cv %>% cv_performance(...).
- cv_performance.multimodel(x, ...)executes- x %>% cv %>% cv_performance(...). Its (implicit) default metric is- default_metric(x).
- cv_performance.default(x, ...)executes- x %>% model %>% cv %>% cv_performance(...), where- xis a fitted model.
See also
The sections on “Metrics” and  “Evaluation weights”
in ?modeltuner; metrics, subset, sort_models.
Examples
# iris data: compare several model approaches
mm <- c(
  lm = model(lm(Sepal.Length ~., iris)), 
  lm2 = model(lm(Sepal.Length ~.^2, iris)), 
  glmnet = model(fm_glmnet(Sepal.Length ~., iris)), 
  glmnet2 = model(fm_glmnet(Sepal.Length ~.^2, iris)), 
  fm_xgb = model(fm_xgb(Sepal.Length ~., iris)))
cvobj <- cv(mm, nfold = 5)
# performance
cvperm <- cv_performance(cvobj)
cvperm
#> --- Performance table ---
#> Metric: rmse
#>         train_rmse test_rmse iteration time_cv
#> lm         0.29880   0.31593        NA   0.008
#> lm2        0.27958   0.31852        NA   0.008
#> glmnet     0.29887   0.31597        78   0.078
#> glmnet2    0.29197   0.31143        65   0.098
#> fm_xgb     0.15151   0.34083        16   0.066
# Sort by test error
sort_models(cvperm, by = "test")
#> --- Performance table ---
#> Metric: rmse
#>         train_rmse test_rmse iteration time_cv
#> glmnet2    0.29197   0.31143        65   0.098
#> lm         0.29880   0.31593        NA   0.008
#> glmnet     0.29887   0.31597        78   0.078
#> lm2        0.27958   0.31852        NA   0.008
#> fm_xgb     0.15151   0.34083        16   0.066
#' print performance table with estimated standard errors (unreliable!)
print(cvperm, se = TRUE)
#> --- Performance table ---
#> Metric: rmse
#>         train_rmse test_rmse se_train_rmse se_test_rmse iteration time_cv
#> lm         0.29880   0.31593     0.0089663    0.0108579        NA   0.008
#> lm2        0.27958   0.31852     0.0118965    0.0122936        NA   0.008
#> glmnet     0.29887   0.31597     0.0089655    0.0109235        78   0.078
#> glmnet2    0.29197   0.31143     0.0079191    0.0106191        65   0.098
#> fm_xgb     0.15151   0.34083     0.0091666    0.0095654        16   0.066
#> The reported standard errors may be inaccurate.