Skip to contents

Obtains test sets (“folds”) for cross-validation procedures with cv(). The user inputs specifications of the test sets; the respective complements will be taken as training sets.

Usage

make_folds(n, nfold = getOption("cv_nfold"), folds = NULL, strata = NULL)

Arguments

n

An integer (corresponding to the number of observations in data) or a data.frame.

nfold

Either an integer vector of length 1 or 2, or a numeric value between 0 and 1 (see “Details”).

folds

A list of integer vectors (optional), predetermined group structure. If folds is given, the arguments nfold and strata will be ignored.

strata

A vector or list of vectors: Strata for cross validation. If specified and in accordance with n, the groups defined in folds are output, ignoring n and nfold.

Value

A list of integer vectors defining the test sets, each of them a subset of 1:n (or 1:nrow(n) if n is a data.frame), of class “folds”.

Details

There are three ways to define the number of groups and group sizes in nfold:

  • A positive integer value: Complete nfold-fold cross-validation; 1:n is partitioned into nfold groups of (nearly) equal size at random.

  • Two positive integer values, the second smaller than the first: Incomplete nfold-fold cross-validation; 1:n is partitioned into nfold[1] groups of (nearly) equal size at random, but only nfold[2] of them are kept.

  • A numeric value between 0 and 1: Hold-out validation; there will be only one test group that will contain approximately n*nfold indices from 1:n.

See also

Examples

make_folds(100, 10)       # Complete 10-fold CV
#> Validation procedure: Complete k-fold Cross-Validation
#>   Number of obs in data:  100
#>   Number of test sets:     10
#>   Size of test sets:       10
#>   Size of training sets:   90
make_folds(100, c(10, 4)) # Incomplete 10-fold CV
#> Validation procedure: Incomplete k-fold Cross-Validation
#>   Number of obs in data:  100
#>   Number of test sets:      4
#>   Size of test sets:       10
#>   Size of training sets:   90
make_folds(100, 0.3)      # Hold-out Validation with 30% test data
#> Validation procedure: Simple Hold-out Validation
#>   Number of obs in data:  100
#>   Number of test sets:      1
#>   Size of test set:        30
#>   Size of training set:    70
make_folds(100, c(3, 1))  # Almost the same as make_folds(100, 1/3)
#> Validation procedure: Incomplete k-fold Cross-Validation
#>   Number of obs in data:  100
#>   Number of test sets:      1
#>   Size of test set:        34
#>   Size of training set:    66
make_folds(iris)          # data as input
#> Validation procedure: Complete k-fold Cross-Validation
#>   Number of obs in data:  150
#>   Number of test sets:     10
#>   Size of test sets:       15
#>   Size of training sets:  135
make_folds(100, folds = list(1:10, 11:40, 41:100))  # Unequal group sizes
#> Validation procedure: Complete k-fold Cross-Validation
#>   Number of obs in data:    100
#>   Number of test sets:        3
#>   Size of test sets:      10-60
#>   Size of training sets:  40-90