Create folds (cross-validation groups)
make_folds.RdObtains test sets (“folds”) for cross-validation procedures with cv().
The user inputs specifications of the test sets; the respective complements will be taken as training sets.
Usage
make_folds(n, nfold = getOption("cv_nfold"), folds = NULL, strata = NULL)Arguments
- n
An integer (corresponding to the number of observations in data) or a
data.frame.- nfold
Either an integer vector of length 1 or 2, or a numeric value between 0 and 1 (see “Details”).
- folds
A list of integer vectors (optional), predetermined group structure. If
foldsis given, the argumentsnfoldandstratawill be ignored.- strata
A vector or list of vectors: Strata for cross validation. If specified and in accordance with
n, the groups defined infoldsare output, ignoringnandnfold.
Value
A list of integer vectors defining the test sets, each of them a subset of 1:n (or 1:nrow(n) if n is a data.frame),
of class “folds”.
Details
There are three ways to define the number of groups and group sizes in nfold:
A positive integer value: Complete
nfold-fold cross-validation;1:nis partitioned intonfoldgroups of (nearly) equal size at random.Two positive integer values, the second smaller than the first: Incomplete
nfold-fold cross-validation;1:nis partitioned intonfold[1]groups of (nearly) equal size at random, but onlynfold[2]of them are kept.A numeric value between 0 and 1: Hold-out validation; there will be only one test group that will contain approximately
n*nfoldindices from1:n.
Examples
make_folds(100, 10) # Complete 10-fold CV
#> Validation procedure: Complete k-fold Cross-Validation
#> Number of obs in data: 100
#> Number of test sets: 10
#> Size of test sets: 10
#> Size of training sets: 90
make_folds(100, c(10, 4)) # Incomplete 10-fold CV
#> Validation procedure: Incomplete k-fold Cross-Validation
#> Number of obs in data: 100
#> Number of test sets: 4
#> Size of test sets: 10
#> Size of training sets: 90
make_folds(100, 0.3) # Hold-out Validation with 30% test data
#> Validation procedure: Simple Hold-out Validation
#> Number of obs in data: 100
#> Number of test sets: 1
#> Size of test set: 30
#> Size of training set: 70
make_folds(100, c(3, 1)) # Almost the same as make_folds(100, 1/3)
#> Validation procedure: Incomplete k-fold Cross-Validation
#> Number of obs in data: 100
#> Number of test sets: 1
#> Size of test set: 34
#> Size of training set: 66
make_folds(iris) # data as input
#> Validation procedure: Complete k-fold Cross-Validation
#> Number of obs in data: 150
#> Number of test sets: 10
#> Size of test sets: 15
#> Size of training sets: 135
make_folds(100, folds = list(1:10, 11:40, 41:100)) # Unequal group sizes
#> Validation procedure: Complete k-fold Cross-Validation
#> Number of obs in data: 100
#> Number of test sets: 3
#> Size of test sets: 10-60
#> Size of training sets: 40-90