Calculate train and test errors based on cross-validation.
cv_performance.Rd
cv_performance
returns a performance table, a summary table of training and test errors for the models included in the
main argument x
, of class “performance”.
cv_performance.cv()
is the core method. All other methods run this method after some preprocessing steps –
see section “Methods”.
The method plot.performance()
generates a graphical display of the performance values in the “performance” object x
,
a bar chart by default, alternatively a line plot (depending on parameter xvar
).
Usage
cv_performance(x, ...)
# S3 method for cv
cv_performance(
x,
metric = x$metric[1],
eval_weights = "default",
na.rm = FALSE,
param = TRUE,
...
)
# S3 method for model
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)
# S3 method for multimodel
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)
# S3 method for default
cv_performance(x, metric = NULL, eval_weights = "default", na.rm = FALSE, ...)
# S3 method for performance
print(
x,
se = getOption("cv_show_se"),
n = getOption("print_max_row"),
digits = 5,
param = TRUE,
...
)
# S3 method for performance
plot(
x,
xvar = "model",
errorbars = getOption("cv_show_se"),
plot = TRUE,
size = 2,
lwd = 1,
lwd_errorbars = 0.5,
zeroline = TRUE,
alpha = 0.3,
...
)
Arguments
- x
“cv” object, or object of another class.
- ...
Passed to the
metric
function.- metric
A metric (see
metrics
), specified either as a character string (name of the metric function), or as a named list of length 1, as inlist(rmse = rmse)
.metric=NULL
selects the default metric, seedefault_metric
.- eval_weights
Evaluation weights; see the “Evaluation weights” in the “Details” section of
?modeltuner
."eval_weights=default
means “use fitting weights” while"eval_weights=NULL
means unweighted evaluation.- na.rm
Logical: Whether NA values should be excluded from computations.
- param
Logical: Keep parameters from the parameter table in the output?
- se
Logical: Show standard errors?
- n
Integer: Maximal number of rows to print.
- digits
Integer: Number of digits to print.
- xvar
If
xvar
if not specified (default), a bar plot is drawn. Alternatively, a line plot is generated, withxvar
as the variable on x axis.xvar
should be a character string, the name of a numeric variable in the performance tablex
. Typically, thexvar
is some hyperparameter varying across models.- errorbars
Logical: Whether to add error bars in plots.
- plot
Logical: If
TRUE
, a ggplot is returned, ifFALSE
adata.frame
.plot()
first prepares adata.frame
and then draws some ggplot using this data, with limited options for customization. If you want to design your own plot, you can setplot=FALSE
, and use thedata.frame
returned byplot()
to create your plot.- size
Graphic detail: Size of point.
- lwd
Graphic detail: Line width of interpolating line.
- lwd_errorbars
Graphic detail: Line width of errorbars.
- zeroline
Logical: Whether to include a horizontal reference line at level 0.
- alpha
Graphic detail: Opacity of bars.
Value
cv_performance()
returns a performance table.
This is a param_table with additional class “performance” having some additional information stored in its attributes.
Each row corresponds to a model. It has columns train_
metric and test_
metric
(e.g., train_rmse
and test_rmse
), se_train_
metric and se_test_
metric,
time_cv
(execution time of the cross-validation), and possibly
more columns being part of the parameter table of the multimodel (see “Details” section in multimodel
).
Details
While different models in a “cv” object can have different metric
s, cv_performance()
always reports the same metric
for all models.
If metric
is not specified in the call to cv_performance()
, the metric from the first model will be chosen
(see default_metric
).
If cv_performance()
is applied to a “cv” object including models having different default weights
(and weights
are not given explicitly), cv_performance()
will use eval_weights=NULL
.
Details on evaluation:
For each fold
, the evaluation metric is calculated separately for the sets of training and test observations,
yielding \(k\) pairs \((train\_err_i, test\_err_i)\), \(i=1, \ldots, k\), where \(k\) is the number of folds.
cv_performance()
reports the average of the \(train\_err_i\), \(i=1, \ldots, k\), as the training error and
the average of the \(test\_err_i\) as the test error.
In case of non-NULL eval_weights
, weighted averages are calculated with group weights computed as
the group-wise sums of the observations weights.
Standard errors of the reported errors are only printed if you set se=TRUE
when printing the performance table,
which is not the case by default (see the option cv_show_se
, cf. modeltuner_options
).
These standard errors are only reported if the number of folds is >1
.
Their computation is based on the assumption of perfect independence of all residuals, and may thus be very unreliable.
As a rough guide, the standard errors are reasonable in case of a model with just a few free parameters and many observations,
while they will severely underestimate the actual uncertainty in the case of models of high structural complexity.
Methods
cv_performance.cv()
is the core method described above. It uses the first cross-validated model's metric as default metric.cv_performance.model(x, ...)
executesx %>% cv %>% cv_performance(...)
.cv_performance.multimodel(x, ...)
executesx %>% cv %>% cv_performance(...)
. Its (implicit) default metric isdefault_metric(x)
.cv_performance.default(x, ...)
executesx %>% model %>% cv %>% cv_performance(...)
, wherex
is a fitted model.
See also
The sections on “Metrics” and “Evaluation weights”
in ?modeltuner
; metrics
, subset
, sort_models
.
Examples
# iris data: compare several model approaches
mm <- c(
lm = model(lm(Sepal.Length ~., iris)),
lm2 = model(lm(Sepal.Length ~.^2, iris)),
glmnet = model(fm_glmnet(Sepal.Length ~., iris)),
glmnet2 = model(fm_glmnet(Sepal.Length ~.^2, iris)),
fm_xgb = model(fm_xgb(Sepal.Length ~., iris)))
cvobj <- cv(mm, nfold = 5)
# performance
cvperm <- cv_performance(cvobj)
cvperm
#> --- Performance table ---
#> Metric: rmse
#> train_rmse test_rmse iteration time_cv
#> lm 0.29880 0.31593 NA 0.008
#> lm2 0.27958 0.31852 NA 0.008
#> glmnet 0.29887 0.31597 78 0.078
#> glmnet2 0.29197 0.31143 65 0.098
#> fm_xgb 0.15151 0.34083 16 0.066
# Sort by test error
sort_models(cvperm, by = "test")
#> --- Performance table ---
#> Metric: rmse
#> train_rmse test_rmse iteration time_cv
#> glmnet2 0.29197 0.31143 65 0.098
#> lm 0.29880 0.31593 NA 0.008
#> glmnet 0.29887 0.31597 78 0.078
#> lm2 0.27958 0.31852 NA 0.008
#> fm_xgb 0.15151 0.34083 16 0.066
#' print performance table with estimated standard errors (unreliable!)
print(cvperm, se = TRUE)
#> --- Performance table ---
#> Metric: rmse
#> train_rmse test_rmse se_train_rmse se_test_rmse iteration time_cv
#> lm 0.29880 0.31593 0.0089663 0.0108579 NA 0.008
#> lm2 0.27958 0.31852 0.0118965 0.0122936 NA 0.008
#> glmnet 0.29887 0.31597 0.0089655 0.0109235 78 0.078
#> glmnet2 0.29197 0.31143 0.0079191 0.0106191 65 0.098
#> fm_xgb 0.15151 0.34083 0.0091666 0.0095654 16 0.066
#> The reported standard errors may be inaccurate.