mlpack_decision_tree - decision tree
Contents
Additional Information
For further information, including relevant papers, citations, and theory, consult the documentation
found at http://www.mlpack.org or included with your distribution of mlpack.
mlpack-4.5.1 29 January 2025 mlpack_decision_tree(1)
Description
Train and evaluate using a decision tree. Given a dataset containing numeric or categorical features, and
associated labels for each point in the dataset, this program can train a decision tree on that data.
The training set and associated labels are specified with the '--training_file (-t)' and '--labels_file
(-l)' parameters, respectively. The labels should be in the range `[0, num_classes - 1]`. Optionally, if
'--labels_file (-l)' is not specified, the labels are assumed to be the last dimension of the training
dataset.
When a model is trained, the '--output_model_file (-M)' output parameter may be used to save the trained
model. A model may be loaded for predictions with the '--input_model_file (-m)' parameter. The
'--input_model_file (-m)' parameter may not be specified when the '--training_file (-t)' parameter is
specified. The '--minimum_leaf_size (-n)' parameter specifies the minimum number of training points that
must fall into each leaf for it to be split. The '--minimum_gain_split (-g)' parameter specifies the
minimum gain that is needed for the node to split. The '--maximum_depth (-D)' parameter specifies the
maximum depth of the tree. If '--print_training_accuracy (-a)' is specified, the training accuracy will
be printed.
Test data may be specified with the '--test_file (-T)' parameter, and if performance numbers are desired
for that test set, labels may be specified with the '--test_labels_file (-L)' parameter. Predictions for
each test point may be saved via the '--predictions_file (-p)' output parameter. Class probabilities for
each prediction may be saved with the '--probabilities_file (-P)' output parameter.
For example, to train a decision tree with a minimum leaf size of 20 on the dataset contained in
'data.csv' with labels 'labels.csv', saving the output model to 'tree.bin' and printing the training
error, one could call
$ mlpack_decision_tree--training_file data.arff --labels_file labels.csv --output_model_file tree.bin
--minimum_leaf_size 20 --minimum_gain_split 0.001 --print_training_accuracy
Then, to use that model to classify points in 'test_set.csv' and print the test error given the labels
'test_labels.csv' using that model, while saving the predictions for each point to 'predictions.csv', one
could call
$ mlpack_decision_tree--input_model_file tree.bin --test_file test_set.arff --test_labels_file
test_labels.csv --predictions_file predictions.csv
Name
mlpack_decision_tree - decision tree
Optional Input Options
--help(-h)[bool]
Default help info.
--info[string]
Print help on a specific option. Default value ''.
--input_model_file(-m)[unknown]
Pre-trained decision tree, to be used with test points. --labels_file (-l) [unknown] Training
labels.
--maximum_depth(-D)[int]
Maximum depth of the tree (0 means no limit). Default value 0.
--minimum_gain_split(-g)[double]
Minimum gain for node splitting. Default value 1e-07.
--minimum_leaf_size(-n)[int]
Minimum number of points in a leaf. Default value 20.
--print_training_accuracy(-a)[bool]
Print the training accuracy.
--test_file(-T)[string]
Testing dataset (may be categorical).
--test_labels_file(-L)[unknown]
Test point labels, if accuracy calculation is desired.
--training_file(-t)[string]
Training dataset (may be categorical).
--verbose(-v)[bool]
Display informational messages and the full list of parameters and timers at the end of execution.
--version(-V)[bool]
Display the version of mlpack.
--weights_file(-w)[unknown]
The weight of labels
Optional Output Options
--output_model_file(-M)[unknown]
Output for trained decision tree.
--predictions_file(-p)[unknown]
Class predictions for each test point.
--probabilities_file(-P)[unknown]
Class probabilities for each test point.
Synopsis
mlpack_decision_tree [-munknown] [-lunknown] [-Dint] [-gdouble] [-nint] [-abool] [-Tstring] [-Lunknown] [-tstring] [-Vbool] [-wunknown] [-Munknown] [-punknown] [-Punknown] [-h-v]
