Create folds (cross-validation groups)
make_folds.Rd
Obtains test sets (“folds”) for cross-validation procedures with cv()
.
The user inputs specifications of the test sets; the respective complements will be taken as training sets.
Usage
make_folds(n, nfold = getOption("cv_nfold"), folds = NULL, strata = NULL)
Arguments
- n
An integer (corresponding to the number of observations in data) or a
data.frame
.- nfold
Either an integer vector of length 1 or 2, or a numeric value between 0 and 1 (see “Details”).
- folds
A list of integer vectors (optional), predetermined group structure. If
folds
is given, the argumentsnfold
andstrata
will be ignored.- strata
A vector or list of vectors: Strata for cross validation. If specified and in accordance with
n
, the groups defined infolds
are output, ignoringn
andnfold
.
Value
A list of integer vectors defining the test sets, each of them a subset of 1:n
(or 1:nrow(n)
if n
is a data.frame
),
of class “folds”.
Details
There are three ways to define the number of groups and group sizes in nfold
:
A positive integer value: Complete
nfold
-fold cross-validation;1:n
is partitioned intonfold
groups of (nearly) equal size at random.Two positive integer values, the second smaller than the first: Incomplete
nfold
-fold cross-validation;1:n
is partitioned intonfold[1]
groups of (nearly) equal size at random, but onlynfold[2]
of them are kept.A numeric value between 0 and 1: Hold-out validation; there will be only one test group that will contain approximately
n*nfold
indices from1:n
.
Examples
make_folds(100, 10) # Complete 10-fold CV
#> Validation procedure: Complete k-fold Cross-Validation
#> Number of obs in data: 100
#> Number of test sets: 10
#> Size of test sets: 10
#> Size of training sets: 90
make_folds(100, c(10, 4)) # Incomplete 10-fold CV
#> Validation procedure: Incomplete k-fold Cross-Validation
#> Number of obs in data: 100
#> Number of test sets: 4
#> Size of test sets: 10
#> Size of training sets: 90
make_folds(100, 0.3) # Hold-out Validation with 30% test data
#> Validation procedure: Simple Hold-out Validation
#> Number of obs in data: 100
#> Number of test sets: 1
#> Size of test set: 30
#> Size of training set: 70
make_folds(100, c(3, 1)) # Almost the same as make_folds(100, 1/3)
#> Validation procedure: Incomplete k-fold Cross-Validation
#> Number of obs in data: 100
#> Number of test sets: 1
#> Size of test set: 34
#> Size of training set: 66
make_folds(iris) # data as input
#> Validation procedure: Complete k-fold Cross-Validation
#> Number of obs in data: 150
#> Number of test sets: 10
#> Size of test sets: 15
#> Size of training sets: 135
make_folds(100, folds = list(1:10, 11:40, 41:100)) # Unequal group sizes
#> Validation procedure: Complete k-fold Cross-Validation
#> Number of obs in data: 100
#> Number of test sets: 3
#> Size of test sets: 10-60
#> Size of training sets: 40-90