heri-split - splits the dataset into training and testing sets
Contents
Description
heri-split splits the dataset into several training and testing sets as it is required for N-fold cross-
validation. Dataset contains one object per line as in svmlight format. By default stratified sampling is
used. That is, all folds contain the same number of objects for each label. If option -c is specified,
testN.txt and trainN.txt files (also in svmlight format) are created, where N is the number of fold. If
option -R is specified, test.txt and train.txt files are created for the same purposes. Also
testing_fold.txt file is created, where for each object (one per line) its testing fold number is
specified if oprion -c is applied. The file testing_fold.txt contain either 1 for testing set and 0 for
training set, if option -R is applied.
Home
Name
heri-split - splits the dataset into training and testing sets
Options
-h,--help
Display help information.
-c,--foldscount
Set the number of folds. This is a mandatory option.
-d,--output-dirdir
Set the output directory. This is a mandatory option.
-r,--random
Use random sampling instead of stratified one.
-R,--ratio
Split the input dataset into training and testing one in the specified ratio (in percents).
-s,--seedseed
Set the seed value for pseudorandom generator.
See Also
heri-eval(1) heri-stat(1) 2021-01-25 heri-split(1)
Synopsis
heri-split [OPTIONS] dataset1 [dataset2...]
