mlpack_linear_svm - linear svm is an l2-regularized support vector machine.
Contents
Additional Information
For further information, including relevant papers, citations, and theory, consult the documentation
found at http://www.mlpack.org or included with your distribution of mlpack.
mlpack-4.5.1 29 January 2025 mlpack_linear_svm(1)
Description
An implementation of linear SVMs that uses either L-BFGS or parallel SGD (stochastic gradient descent) to
train the model.
This program allows loading a linear SVM model (via the '--input_model_file (-m)' parameter) or training
a linear SVM model given training data (specified with the '--training_file (-t)' parameter), or both
those things at once. In addition, this program allows classification on a test dataset (specified with
the '--test_file (-T)' parameter) and the classification results may be saved with the
'--predictions_file (-P)' output parameter. The trained linear SVM model may be saved using the
'--output_model_file (-M)' output parameter.
The training data, if specified, may have class labels as its last dimension. Alternately, the
'--labels_file (-l)' parameter may be used to specify a separate vector of labels.
When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can be
specified with the '--lambda (-r)' option, and the number of classes can be manually specified with the
'--num_classes (-c)'and if an intercept term is not desired in the model, the '--no_intercept (-N)'
parameter can be specified.Margin of difference between correct class and other classes can be specified
with the '--delta (-d)' option.The optimizer used to train the model can be specified with the
'--optimizer (-O)' parameter. Available options are 'psgd' (parallel stochastic gradient descent) and
'lbfgs' (the L-BFGS optimizer). There are also various parameters for the optimizer; the
'--max_iterations (-n)' parameter specifies the maximum number of allowed iterations, and the
'--tolerance (-e)' parameter specifies the tolerance for convergence. For the parallel SGD optimizer, the
’--step_size (-a)' parameter controls the step size taken at each iteration by the optimizer and the
maximum number of epochs (specified with '--epochs (-E)'). If the objective function for your data is
oscillating between Inf and 0, the step size is probably too large. There are more parameters for the
optimizers, but the C++ interface must be used to access these.
Optionally, the model can be used to predict the labels for another matrix of data points, if
'--test_file (-T)' is specified. The '--test_file (-T)' parameter can be specified without the
'--training_file (-t)' parameter, so long as an existing linear SVM model is given with the
'--input_model_file (-m)' parameter. The output predictions from the linear SVM model may be saved with
the '--predictions_file (-P)' parameter.
As an example, to train a LinaerSVM on the data ''data.csv'' with labels ’'labels.csv'' with L2
regularization of 0.1, saving the model to ’'lsvm_model.bin'', the following command may be used:
$ mlpack_linear_svm--training_file data.csv --labels_file labels.csv --lambda 0.1 --delta 1
--num_classes 0 --output_model_file lsvm_model.bin
Then, to use that model to predict classes for the dataset ''test.csv'', storing the output predictions
in ''predictions.csv'', the following command may be used:
$ mlpack_linear_svm--input_model_file lsvm_model.bin --test_file test.csv --predictions_file
predictions.csv
Name
mlpack_linear_svm - linear svm is an l2-regularized support vector machine.
Optional Input Options
--delta(-d)[double]
Margin of difference between correct class and other classes. Default value 1.
--epochs(-E)[int]
Maximum number of full epochs over dataset for psgd Default value 50.
--help(-h)[bool]
Default help info.
--info[string]
Print help on a specific option. Default value ''.
--input_model_file(-m)[unknown]
Existing model (parameters). --labels_file (-l) [unknown] A matrix containing labels (0 or 1) for
the points in the training set (y).
--lambda(-r)[double]
L2-regularization parameter for training. Default value 0.0001.
--max_iterations(-n)[int]
Maximum iterations for optimizer (0 indicates no limit). Default value 10000.
--no_intercept(-N)[bool]
Do not add the intercept term to the model.
--num_classes(-c)[int]
Number of classes for classification; if unspecified (or 0), the number of classes found in the
labels will be used. Default value 0.
--optimizer(-O)[string]
Optimizer to use for training ('lbfgs' or 'psgd'). Default value 'lbfgs'.
--seed(-s)[int]
Random seed. If 0, 'std::time(NULL)' is used. Default value 0.
--shuffle(-S)[bool]
Don't shuffle the order in which data points are visited for parallel SGD.
--step_size(-a)[double]
Step size for parallel SGD optimizer. Default value 0.01.
--test_file(-T)[unknown]
Matrix containing test dataset.
--test_labels_file(-L)[unknown]
Matrix containing test labels.
--tolerance(-e)[double]
Convergence tolerance for optimizer. Default value 1e-10.
--training_file(-t)[unknown]
A matrix containing the training set (the matrix of predictors, X).
--verbose(-v)[bool]
Display informational messages and the full list of parameters and timers at the end of execution.
--version(-V)[bool]
Display the version of mlpack.
Optional Output Options
--output_model_file(-M)[unknown]
Output for trained linear svm model.
--predictions_file(-P)[unknown]
If test data is specified, this matrix is where the predictions for the test set will be saved.
--probabilities_file(-p)[unknown]
If test data is specified, this matrix is where the class probabilities for the test set will be
saved.
Synopsis
mlpack_linear_svm [-ddouble] [-Eint] [-munknown] [-lunknown] [-rdouble] [-nint] [-Nbool] [-cint] [-Ostring] [-sint] [-Sbool] [-adouble] [-Tunknown] [-Lunknown] [-edouble] [-tunknown] [-Vbool] [-Munknown] [-Punknown] [-punknown] [-h-v]
