mlpack_logistic_regression - l2-regularized logistic regression and prediction
Contents
Additional Information
For further information, including relevant papers, citations, and theory, consult the documentation
found at http://www.mlpack.org or included with your distribution of mlpack.
mlpack-4.5.1 29 January 2025 mlpack_logistic_regression(1)
Description
An implementation of L2-regularized logistic regression using either the L-BFGS optimizer or SGD
(stochastic gradient descent). This solves the regression problem
y = (1 / 1 + e^-(X * b)).
In this setting, y corresponds to class labels and X corresponds to data.
This program allows loading a logistic regression model (via the ’--input_model_file (-m)' parameter) or
training a logistic regression model given training data (specified with the '--training_file (-t)'
parameter), or both those things at once. In addition, this program allows classification on a test
dataset (specified with the '--test_file (-T)' parameter) and the classification results may be saved
with the '--predictions_file (-P)' output parameter. The trained logistic regression model may be saved
using the ’--output_model_file (-M)' output parameter.
The training data, if specified, may have class labels as its last dimension. Alternately, the
'--labels_file (-l)' parameter may be used to specify a separate matrix of labels.
When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can be
specified with the '--lambda (-L)' option, and the optimizer used to train the model can be specified
with the '--optimizer (-O)' parameter. Available options are 'sgd' (stochastic gradient descent) and
’lbfgs' (the L-BFGS optimizer). There are also various parameters for the optimizer; the
'--max_iterations (-n)' parameter specifies the maximum number of allowed iterations, and the
'--tolerance (-e)' parameter specifies the tolerance for convergence. For the SGD optimizer, the
'--step_size (-s)' parameter controls the step size taken at each iteration by the optimizer. The batch
size for SGD is controlled with the '--batch_size (-b)' parameter. If the objective function for your
data is oscillating between Inf and 0, the step size is probably too large. There are more parameters for
the optimizers, but the C++ interface must be used to access these.
For SGD, an iteration refers to a single point. So to take a single pass over the dataset with SGD,
'--max_iterations (-n)' should be set to the number of points in the dataset.
Optionally, the model can be used to predict the responses for another matrix of data points, if
'--test_file (-T)' is specified. The '--test_file (-T)' parameter can be specified without the
'--training_file (-t)' parameter, so long as an existing logistic regression model is given with the
’--input_model_file (-m)' parameter. The output predictions from the logistic regression model may be
saved with the '--predictions_file (-P)' parameter.
This implementation of logistic regression does not support the general multi-class case but instead only
the two-class case. Any labels must be either 0 or 1. For more classes, see the softmax regression
implementation.
As an example, to train a logistic regression model on the data ''data.csv'' with labels ''labels.csv''
with L2 regularization of 0.1, saving the model to ’'lr_model.bin'', the following command may be used:
$ mlpack_logistic_regression--training_file data.csv --labels_file labels.csv --lambda 0.1
--output_model_file lr_model.bin --print_training_accuracy
Then, to use that model to predict classes for the dataset ''test.csv'', storing the output predictions
in ''predictions.csv'', the following command may be used:
$ mlpack_logistic_regression--input_model_file lr_model.bin --test_file test.csv --predictions_file
predictions.csv
Name
mlpack_logistic_regression - l2-regularized logistic regression and prediction
Optional Input Options
--batch_size(-b)[int]
Batch size for SGD. Default value 64.
--decision_boundary(-d)[double]
Decision boundary for prediction; if the logistic function for a point is less than the boundary,
the class is taken to be 0; otherwise, the class is 1. Default value 0.5.
--help(-h)[bool]
Default help info.
--info[string]
Print help on a specific option. Default value ''.
--input_model_file(-m)[unknown]
Existing model (parameters). --labels_file (-l) [unknown] A matrix containing labels (0 or 1) for
the points in the training set (y).
--lambda(-L)[double]
L2-regularization parameter for training. Default value 0.
--max_iterations(-n)[int]
Maximum iterations for optimizer (0 indicates no limit). Default value 10000.
--optimizer(-O)[string]
Optimizer to use for training ('lbfgs' or 'sgd'). Default value 'lbfgs'.
--print_training_accuracy(-a)[bool]
If set, then the accuracy of the model on the training set will be printed (verbose must also be
specified).
--step_size(-s)[double]
Step size for SGD optimizer. Default value 0.01.
--test_file(-T)[unknown]
Matrix containing test dataset.
--tolerance(-e)[double]
Convergence tolerance for optimizer. Default value 1e-10.
--training_file(-t)[unknown]
A matrix containing the training set (the matrix of predictors, X).
--verbose(-v)[bool]
Display informational messages and the full list of parameters and timers at the end of execution.
--version(-V)[bool]
Display the version of mlpack.
Optional Output Options
--output_model_file(-M)[unknown]
Output for trained logistic regression model.
--predictions_file(-P)[unknown]
If test data is specified, this matrix is where the predictions for the test set will be saved.
--probabilities_file(-p)[unknown]
If test data is specified, this matrix is where the class probabilities for the test set will be
saved.
Synopsis
mlpack_logistic_regression [-bint] [-ddouble] [-munknown] [-lunknown] [-Ldouble] [-nint] [-Ostring] [-abool] [-sdouble] [-Tunknown] [-edouble] [-tunknown] [-Vbool] [-Munknown] [-Punknown] [-punknown] [-h-v]
